Last Week's Potatoes: Generative Adversarial Networks (Original Formulation)

Andrea Bonvini

Generative Adversarial Networks (Original Formulation)

“Generative Adversarial Networks” (Goodfellow et al.) Paper overview.

Author

Affiliation

Andrea Bonvini

Published

Dec. 13, 2020

Citation

Bonvini, 2020

Let’s start with a question: what is Generative Modeling?

Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset.

GANs are a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models: the generator model that we train to generate new examples, and the discriminator model that tries to classify examples as either real (from the domain) or fake (generated).

Formulation

Generator : produces realistic samples e.g. taking as input some random noise. tries to fool the discriminator
Discriminator that takes as input an image and assess whether it is real or generated by

Both and are conveniently chosen as MLPs. The generative process depends on two networks:

and are the network parameters, is an input image (either real or generated by ) and is some random noise to be fed to the generator. We suppose that is sampled from a distribution i.e. and is sampled from a distribution i.e. . Our Discriminator’s output is to be seen as the probability that the input image comes from the data and not from the generator:

The generator gives as output a generated image:

A good discriminator is such that:

is maximum when (i.e. is sampled for the real images dataset )
is maximum when was generated by
is maximum when

Training consists in maximizing the binary cross-entropy:

Where

has to be since , namely images are real.
has to be since is a generated (fake) image.

A good generator is one that makes fail:

Optimizing to completion in the inner loop of training is computationally prohibitive, and on finite datasets would result in overfitting. Instead, we alternate between steps of optimizing and one step of optimizing . This results in being maintained near its optimal solution, as long as changes slowly enough.

Let’s schematize it:

We need to solve by an iterative numerical approach the min max game shown at . In order to do so we alternate:

-steps of Stochastic Gradient Ascent w.r.t. to solve
-step of Stochastic Gradient Descent w.r.t. being fixed:

i.e. (the removed term does not depend on )

i.e.

There is a reason why Goodfellow proposed to optimize instead of . If we try to descend the gradient of , we notice that at the beginning of the training process, when the generated samples would be easily classified as “fake” (i.e. ), there would be too few gradient in order to learn properly!

We have the following value function for our min-max problem:

This last equality comes from the Radon-Nikodym Theorem of measure theory and it’s sometimes referred as the Law Of The Unconscious Statistician (or LOTUS Theorem) since students have been accused of using the identity without realizing that it must be treated as the result of a rigorously proved theorem, not merely a definition (if you want the full proof check this out! )

Let’s first consider the optimal discriminator for any given generator . The training criterion for the discriminator , given any generator , is to maximize the quantity defined below:

For the individual sample we derive w.r.t. and we equal this quantity to in order to find the optimal discriminator

Does this point represent a maximum? we have to check if the second derivative calculated in is negative.

The quantity above is negative for every , included, since and are between and .

We then can plug into and find the optimal generator as:

Where the Kullback-Leibler divergence (KL) and the Jenson-Shannon divergence (JSD) are quantities that measure the difference between two distributions and we know that only when !

Theorem :

The global minimum of the virtual training criterion

is achieved if and only if

. At that point,

achieves the value

Besides, that was what we expected! We wanted our generator to learn the same distribution which generated the data. If we know that then it’s trivial to observe that at the end of the training process the optimal discriminator will be forced to output since it won’t be able to distinguish between real and fake samples anymore.

But does this converge?

Well, as stated in the original Paper:

If and have enough capacity, and at each step of our algorithm, the discriminator is allowed to reach its optimum given and is updated so as to improve the criterion

then converges to .

Proof:

Consider as a function of as done in the above criterion. Note that is convex in .

The subderivatives of a supremum of convex functions include the derivative of the function at the point where the maximum is attained. In other words, if and is convex in for every , then if . This is equivalent to computing a gradient descent update for at the optimal given the corresponding . is convex in with a unique global optima as proven in Theorem , therefore with sufficiently small updates of , converges to , concluding the proof.

In practice, adversarial nets represent a limited family of distributions via the function , and we optimise rather than itself. Using a multilayer perceptron to define introduces multiple critical points in parameter space. However, the excellent performance of multilayer perceptrons in practice suggests that they are a reasonable model to use despite their lack of theoretical guarantees.

Citation

For attribution, please cite this work as

Bonvini (2020, Dec. 14). Last Week's Potatoes: Generative Adversarial Networks (Original Formulation). Retrieved from https://lastweekspotatoes.com/posts/2021-07-22-generative-adversarial-networks-original-formulation/

BibTeX citation

@misc{bonvini2020generative,
  author = {Bonvini, Andrea},
  title = {Last Week's Potatoes: Generative Adversarial Networks (Original Formulation)},
  url = {https://lastweekspotatoes.com/posts/2021-07-22-generative-adversarial-networks-original-formulation/},
  year = {2020}
}

Generative Adversarial Networks (Original Formulation)

Author

Affiliation

Published

Citation

Formulation

Footnotes

Citation