Exploring deep (latent) space 🚀

In this post, we intuitively explore deep generative models and why understanding models’ latent space is important for more interpretable, accountable AI systems.

Neelan Pather, data scientist @ Elucidate AI

In this post, we intuitively explore deep generative models and why understanding models’ latent space is important for more interpretable, accountable AI systems.

A primer on deep generative models

Deep generative models learn to reproduce the patterns observed in training data. Often these models learn in an unsupervised manner, avoiding the need for costly labels. Modern incarnations of such models have become popular in the machine learning community due to their ability to reproduce high fidelity images. Websites such as ThisPersonDoesNotExist.com have become notorious for the models’ ability to reproduce synthetic images that are often creepily indistinguishable from real people.

As the name suggests the “people” from the website are synthetic reproductions. The people in them are not real and were not captured by a photograph, but rather are generated by deep learning models which learn the patterns within some dataset. If the dataset contains common objects, the models learns the typical variation of those objects. As such, these models are able to recreate the learnt patterns and variation to produce synthetic version of data with the common objects. By learning the underlying patterns within the data, deep genitive models are able to reproduce them, resulting in life-like synthetic data.

While high fidelity reconstruction is of particular importance in the image domain, what may be of (arguably) more importance are what patterns are learnt by the models to accomplish the reconstruction task. To successfully reproduce high dimensional data, deep generative models learn the data’s structure, often in an unsupervised way. Importantly by learning the structure of the dataset, meaningful insights may be gleaned.

The deep generative models behind ThisPersonDoesNotExist.com are known as a Generative Adversarial Networks (GANs). Such models are known to reconstruct near perfect images, but fail to capture the data’s full diversity. They’re also a nightmare to train, along with other common problems.

If your goal, however, is to understand your data better, near perfect image generation may be of little interest and a model that learns a structure reflective of the whole dataset would be preferable. The most popular models for this are Variational Auto-encoders (VAEs) and are a favourite amongst researchers and practitioners.

Since VAEs are generative models; when trained, they attempt to generate an output that is reflective of the input. In order to do this, an appropriate error is needed so that the network can be repeatedly readjusted so that this error is reduced. A VAE has an encoder-decoder structure, with layers forming neural networks. Within this structure, the input is decoded and the output is decoded. When a VAE learns, for a given example, the difference between the models, input and generated output are used to create an error. This error is then used to adjust the encoder and decoder networks so that they may produce fewer errors in subsequent examples.

The images below illustrates the encoder-decoder structure in a VAE, showing how generative models can learn from various data types.

But what about the latent representation?

In the images above, the encoder is connected to the decoder via a latent representation. This latent representation is the source of the variation in the model; it is what the input is decoded into and conversely is what is decoded to generate the reproduced data. It is known as the “latent code” or “latent space”. Importantly, the generative models learn a latent space that is reflective of the underlying data. Importantly each layer of a neural network, due to the heretical nature of deep neural networks, represents higher order concepts that apply to the data.

Within a deep neural network, different layers extract different patterns. As one moves further away from the input, towards the latent representation, the patterns extracted build upon previously extracted patterns. The result is that the final layer of the model is able to capture the most conceptual complexity in the data. For example, in a deep neural network applied to photographs of different cat breeds, we see the “feature maps” at different layers, learning to look at different aspects of the photo. Initially, the textures are examined before assessing the outline. As we move further away from the input, the feature maps become less visually interpretable and begin to represent abstract ideas that are more semantically interpretable, such as the cat breed. While these higher levels may not be visually interpretable, they may align with a higher human level of understanding of the data. This article explains feature visualisation in more detail.

Heretical nature of deep neural networks mean layers learn increasingly more abstract concepts. Image source

The combination of all this information allows for the particular breed to be identified in the final “hidden” layer. We call this final hidden layer, which captures the most conceptual complexity, the “latent space”, after the Latin word meaning “hidden”.

The idea of analysis by generation is not new and popular methods such a Principal Components Analysis (PCA) have existed since 1901. The key idea is to explain the overall variation in data by the movement of several explanatory components or “factors” that characteristics the data. PCA tires to find factors that may be combined in a linear fashion. limiting the complexity of the relationship that may be captured. Auto-encoder type models such as VAE, however, use neural network to combine factors, allowing more complex relationships to be captured, as seen in the image below.

Auto-encoders allow for complex non-linear relationships between factors to be captured. Image source

Understanding the latent space amounts to understanding how a model has interpreted the underlying variability within a dataset. Understanding this allows one to better interpret how a model sees the world and ultimately how a model has made its decision. In the image below, we see how a VAE has arranged the MNIST hand-written digits in the latent space according to digit in a purely unsupervised manner. It has grouped similar variation together, resulting in similar digits being grouped together.

VAE latent space arranges MNIST according to digit in a purely unsupervised manner. Image source

Why do latent representations even matter?

In order to use deep learning models in highly regulated industries such as finance, medicine and consumer facing products, we need to be sure of how they made their decision. While being only part of the solution, exploring latent spaces allows for better understanding of machine intelligence. Furthermore, latent spaces arrange similar data points together, with notions of similarity learnt from data in an unsupervised manner. This allows us to better perceive similarities that our human intelligence may miss.


For the curious...

Happy exploring 🪐

Enjoyed this read?

Stay up to date with the latest AI news, strategies, and insights sent straight to your inbox!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.