Deep Generative Models

SSL = create a supervised task automatically from raw data We basically hide something and ask the model to predict it, like masking words in BERT.

SSL in Vision

Relative patch position, colorization, rotation prediction… CNN SSL is hard because you cant mask pixels arbitrarily, since CNNs blur local contexte. Vision Transformers (ViT) changed this by treating image patches as tokens, enablung MAE. MAE → Masked Auto-encoders (BERT for images)

How to evaluate?

Linear Probe
Fine-tuning

Semi-supervised: labeled + unlabeled jointly → one model, two heads Unsupervised pre-training: first train SSL on large unlabeled set → then fine-tune on labeled set

Generative Models

GANs

It’s a way to make a generative model by having two NNs compete with each other. A generator which turns noise into fake samples, and a discriminator that tries to identify real data from fakes.

It has very sharp samples, produces high-res images, but it’s hard to train, and there’s no explicit likelihood function.

Stable Diffusion

Start with a real image and progressively apply noise, then train a model to work backwards producing less noisy images from noisy ones

🌘 Patrick's notes

Explorer

Deep Generative Models

SSL in Vision

Generative Models

GANs

Stable Diffusion

Graph View

Table of Contents