Wasserstein GANs Explained

An explanation of my interactive GAN project

What is a Wasserstein GAN?

A generative adversarial neural network is a combination of a neural network that generates new data and another that tries to tell the generated data apart from the actual data. In a classic GAN, the network responsible for finding the discrepancies between generated and real data is called the discriminator. It has the objective of correctly classifying the data in a binary way. What is different from a classic GAN in a Wasserstein GAN is that the model which distinguishes between the generated and actual data is called a critic. The objective of the critic is to rate the data continuously. This way, a generator receives information about how far off its output is instead of simply that it is incorrect. Wasserstein GANs have many advantages over classic GANs. Most notably, Wasserstein GANs are much easier to train than a classic GAN. GANs suffer from many issues during training that can entirely render the model useless.
In a WGAN, the model attempts to learn a distribution over the input data by minimizing the distance from the generated distribution to the target distribution. The generated distribution is created through a prior distribution transformed by a neural network, namely the generator. The distance of the distributions is measured based on features learned from a different neural network. The critic can be thought of as an encoder of the content inside the data, which creates a neural network representation and then feeds into a regression model. By generating the neural network representation of the contents of the inputs, the critic can tell the generator what characteristics were made incorrectly through trial and error.

Gradient Penalty Approach

To ensure that the measure of distance between the generated and target distributions is constrained, a gradient penalty can be applied to the loss function of the critic. The gradient penalty is computed by interpolating between the real data and the generated data to ensure that the weights of the critic do not overfit to the target data. If the critic memorizes the entire training set, it will consistently rate the real data highly and the generated data poorly. The weights of the critic need to be able to create a representation of the generated distribution which means that it needs to be within a region that does not exclude characteristics of the generated data. Since the critic must be trained to distinguish between generated and real data, it would naturally discard these features.

Data Source and Preprocessing

For my WGAN project, I used data from Google’s OpenImage dataset. I filtered the data by choosing the categories of images that fit the theme of my project idea. Because Wasserstein is pronounced with a “v” sound, I thought it would be entertaining to apply this model to an image set of vegetation, as WGAN would then logically be pronounced similarly to the word “vegan,” hence the name VegGAN for the project. The Image data was cropped to the center pixels, both vertically and horizontally, to the size of the generated model output and resized for images that were either too short or too thin.

Architectural Design Choices

For the final model, I used a residual convolutional block model architecture with additive dot product attention in each convolutional block for both the generator and critic. Residual blocks in the generator consisted of three parts. The first was an upsampling layer followed by a convolutional layer. The second part followed a pre-normalization convention with a batch normalization of the block inputs, a ReLU activation, an upsampling layer, and two sets of pre-normalization convolutional layers. The Third part took the outputs of the second part and was a query, key, and value dot product attention mechanism with ReLU activations for each of the query, key, and value convolutional layers. All three block outputs were added together for the final block output. Residual blocks in the critic followed a similar design with max-pooling layers to replace upsampling layers and layer normalization layers in place of batch normalization layers.

Other Approaches to Image Synthesis

Other types of models have become increasingly prevalent in the image synthesis task. Most notably, variational autoencoders and denoising diffusion probabilistic models have rivaled the performance of generative adversarial networks in terms of image quality based on the Frechet Inception Distance metric. The FID metric is a commonly used measure of the distance of the mean and variance of activations within a layer of the inception network between the generated images and the training data for image generation models to measure the similarity between the distributions. Variational autoencoders are models with an encoder-decoder architecture that models the inputs with a distribution over the weights inside the network. Denoising diffusion probabilistic models are models that iteratively remove noise from an image based on the probability of each pixel belonging to the training distribution.

Hello, you can use this chat to ask me how I can assist with your project or business needs and it will return detailed information about my capabilities.

Wasserstein GANs Explained

What is a Wasserstein GAN?

Gradient Penalty Approach

Data Source and Preprocessing

Architectural Design Choices

Other Approaches to Image Synthesis