GANs

Generate faces by DCGANs

To understand how a machine can generate something it’s never seen before, you need to first understand what is Generative Adversarial Networks(GANs).

Autoencoders

Before we start to talk about GANs, let’s take a look at an algorithm call “Autoencoding” to distinguish the difference between “Encoder-Decoder” and “Discriminator-Generator”. “Autoencoding” is a data compression algorithm, the compression and decompression functions are:

  1. Data-specific
  2. Lossy
  3. Learned automatically from examples rather than engineered by a human

Additionally, in almost all contexts where the term “autoencoder” is used, the compression and decompression functions are implemented with neural networks.

We did this with upsampling in an auto-encoder, where we re-sized a layer with nearest neighbor interpolation, then passed that to a convolutional layer. By using this network architecture to perform data compression, where the compression and decompression functions are learned from the data itself, not hand-engineered by humans.

In practice, autoencoders are actually better at

  1. Image denoising
  2. Dimensionality reduction.

Since we won’t talk about this architecture in this post, you may find more info at https://blog.keras.io/building-autoencoders-in-keras.html

 

What is Generative Adversarial Networks(GANs)?

GANs are used for generating realistic data which a machine has never seen and trained before. Most application of GANs so far have been for images. There are lots of examples that have been made with a excellent result including:

StackGAN: Model is taking a textual description of a bird, then generating a high resolution photo of a bird matching that description.

 

Pix2Pix: As the user draws very crude sketches using the mouse, GANs searches for the nearest possible realistic image.

 

It can even change a horse in a video to become a zebra. (Because the training is totally unsupervised, we can see that it changes a few things besides the horse to zebra. The tone of the environment is more like a desert than the original.)

How GANs Work

Most generative models are trained by adjusting the parameters to maximize the probability that the generator net will generate the training data set.  Unfortunately for a lot of interesting models, it can be very difficult to compute this probability. Most generative models get around that with some kind of approximation.

Generative Adversarial Networks(GANs) use an approximation where a second network, called the discriminator, learns to guide the generator. Or you may think we want our Generator can create a good enough result to fool our Discriminator; to make Discriminator to think the generated result from Generator is the same as what Discriminator has been trained before.

To fully understand GANs, we need to think about how payoffs and equilibrium work in the context of machine learning. If we can identify an equilibrium in the GAN game we can use that equilibrium as a defining characteristic to understand that game.  ( This part is related to Game Theory and Optimization, I will try to create a post to talk about these topics in the future.. )

 

Deep Convolutional GANs (DCGANs)

The idea of DCGANs is the network tries to generate it’s input data, but with a narrow hidden layer that serves as a compressed representation of the input data

Discriminator in Convolutional Network

The discriminator is a convolutional network, with one fully connected layer at the end, used as the sigmoid output. The final convolutional layer is flattened, then connected to a single sigmoid unit. The layers have leaky ReLU activations and batch normalization on the inputs.

Generator in Convolutional Network

In generator we do the transposed convolution to convolute the image to the size of our image. The generator first goes from narrow and deep to wide and flat, transposed convolutions are similar to the convolutions but flipped.

Final layer

There are various way to improve the result in Generator, and those tricks are constantly changing including ReLU Activations and Batch normalization.

This helps the network to train faster, and reduces problems due to poor parameter initialization. With batch normalization, you’ll be able to create deeper networks with the GAN, and get better results. I know that is a bit confusing with just reading all these crazy terms. Let’s take a look at the demo I have created before to understand each step inside a DCGANs.

 

Comments