Build an Autoencoder Using PyTorch | Codecademy
— Note: This post complements part 1 of the Variational Autoencoders (VAEs) event series. —
Table of Contents:
- Introducing Autoencoders
- Reconstructing MNIST
- Wrapping Up
Introduction to Autoencoders
An autoencoder is a machine that sees something, remembers the main parts, and then tries to draw it again.
For example, an autoencoder takes a handwritten digit image (high-dimensional data) and compresses it through an encoder into a smaller set of values (latent representation) that captures the key features of the digit. Then, a decoder uses this latent representation to recreate an image (decoded image) that closely resembles the original digit.
An autoencoder has three main parts that work together to learn hidden patterns in data:
- Encoder: This acts like a bottleneck. It takes the input data (like an image of a handwritten digit) and shrinks it down to a smaller representation called the latent space. This space captures the most important features of the data, like the shape and position of the digit in the image.
- Latent Space: This is the compressed version of the original data created by the encoder. It’s like a secret code that holds the essence of the input. In our example, it would contain information about the curves and lines that make up the digit.
- Decoder: This acts like an artist. It takes the compressed code from the latent space and tries to rebuild the original data (like recreating the image of the digit). By comparing the reconstructed data to the original, the autoencoder learns to improve its encoding and decoding abilities.
What is a Latent Representation?
A latent representation is like a secret code for your data but in a more compact version. It holds the essence of the original data.
Imagine taking a picture of some cute dogs and cats, and then summarizing them with just the shapes of their ears, or their little whiskers — that’s kind of like a latent representation. It keeps the important details but throws away unnecessary information.
You know all the essentials now, time to code. Your task is to Reconstruct The MNIST Dataset — You’ll be using PyTorch and TorchVision.
Reconstructing The MNIST Dataset Using PyTorch and TorchVision
Step 1: Build the structure of your model
Let’s start by importing all the necessary modules you’ll be using for this task.
TorchVision is part of PyTorch’s ecosystem that makes dealing with picture-related tasks easier. You will use it to perform the needed transformations on your dataset.
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
Time to define your model structure.
In PyTorch, you define the structure of your model using classes.
You’re creating a class called Autoencoder that inherits from nn.Module , the base class for all neural network modules in PyTorch.
class Autoencoder(nn.Module):
def __init__(self, encoding_dim):
super(Autoencoder, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(784, encoding_dim),
nn.ReLU()
)
self.decoder = nn.Sequential(
nn.Linear(encoding_dim, 784),
nn.Sigmoid()
)
def forward(self, x):
encoded = self.encoder(x)
decoded = self.decoder(encoded)
return decoded
The __init__ method sets up the layers of the model.
The encoding_dim parameter is the size of the encoded (compressed) representation. You’ll set it in the next cell.
Let’s dig deeper and understand what’s happening in self.encoder and self.decoder.
Encoder:
- self.encoder is a sequence of layers that compresses the input.
- nn.Linear(784, encoding_dim) creates a fully connected layer that transforms the input from 784 dimensions to encoding_dim dimensions.
- nn.ReLU() applies the ReLU activation function, which introduces non-linearity to the model.
Decoder:
- self.decoder is a sequence of layers that reconstructs the input from the encoded representation.
- nn.Linear(encoding_dim, 784) creates a fully connected layer that transforms the encoded representation back to 784 dimensions (the original size).
- `nn.Sigmoid()` applies the Sigmoid activation function, which squashes the output to be between 0 and 1, suitable for image data.
Forward Pass: The forward method defines how data passes through the model.
- encoded = self.encoder(x): passes the input x through the encoder to get the encoded representation.
- decoded = self.decoder(encoded): passes the encoded representation through the decoder to get the reconstructed output.
- return decoded: returns the reconstructed output.
Step 2: Set Your Hyperparameters
encoding_dim = 32
batch_size = 256
num_epochs = 50
learning_rate = 0.001
encoding_dim = 32
:
This sets the size of the encoded (compressed) representation in the autoencoder. The input image (originally 784 dimensions) will be compressed to 32 dimensions.
batch_size = 256:
This sets the number of samples processed before the model’s internal parameters are updated. Instead of updating the model for each image, you update it after processing 256 images at once.
num_epochs = 50:
This sets the number of times the entire training dataset will pass through the model. Training will repeat the dataset 50 times to learn the optimal weights.
learning_rate = 0.001:
This sets the rate at which the model updates its parameters in response to the computed error during training.
A smaller learning rate means smaller updates and more stable convergence, while a larger learning rate means faster updates but a risk of overshooting the optimal values.
Step 3: Load The MNIST Dataset
In this step, your goal is to load the MNIST dataset and prepare your data for training.
You’ll be using PyTorch’s DataLoaders.
DataLoaders in PyTorch are tools that help you efficiently feed data into your machine learning models. Here’s what they do:
- Preparation: They take your dataset and organize it into batches.
- Iteration: They provide an easy way to loop through these batches during training.
- Shuffling: They can automatically shuffle the data to improve learning.
- Parallelism: They can load data in parallel, making the process faster.
- Memory efficiency: They load data in chunks, so you don’t need to load the entire dataset into memory at once.
Let’s go ahead and load your dataset!
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
Here is a step-by-step explanation of what each line of code does:
transform = transforms.Compose([transforms.ToTensor()])
: This creates a transformation that will convert the images to PyTorch tensors.train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
: This loads the MNIST training dataset. It will be saved in the './data' directory, and if it's not already there, it will be downloaded. The images will be converted to tensors.test_dataset = datasets.MNIST(root='./data', train=False, transform=transform, download=True)
: This does the same thing for the MNIST test dataset.train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
: This creates a DataLoader for the training data. It will serve up the data in batches of sizebatch_size
and shuffle the data each epoch.test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
: This creates a DataLoader for the test data. It also serves data in batches but doesn't shuffle the data (as order doesn't matter for testing).
Step 4: Initialize the Model, Loss Function, and Optimizer
Now that you have your model structure, hyperparameters, and data ready, it’s time to instantiate your model, pick a loss function, and set up your optimizer.
PS: The loss function measures how well the model is performing and the optimizer adjusts the model’s parameters during training.
model = Autoencoder(encoding_dim)
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
model = Autoencoder(encoding_dim)
: Creates an instance of the Autoencoder class, which you defined in Step 1.criterion = nn.BCELoss()
: Sets up the loss function. BCELoss stands for Binary Cross Entropy Loss, which is commonly used for autoencoders when the input data is normalized between 0 and 1.optimizer = optim.Adam(model.parameters(), lr=learning_rate)
: Creates an Adam optimizer, a popular choice for training neural networks. It will update the model's parameters to minimize the loss.- The
lr
parameter sets the learning rate and controls how big the parameter updates are.
Step 5: Train Your Model
With your model, loss function, and optimizer in place, it’s time to begin the training process.
>>> for epoch in range(num_epochs):
... for data in train_loader:
... img, _ = data
... img = img.view(img.size(0), -1)
# Forward pass
... output = model(img)
... loss = criterion(output, img)
# Backward pass
... optimizer.zero_grad()
... loss.backward()
... optimizer.step()
>>> print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
Epoch [1/50], Loss: 0.2110
Epoch [2/50], Loss: 0.1808
Epoch [3/50], Loss: 0.1500
...
...
Epoch [49/50], Loss: 0.1105
Epoch [50/50], Loss: 0.1175
This code block runs the training loop for the autoencoder. It repeats for a set number of epochs, each time going through all the training data. In each step, it:
1. Gets a batch of images
2. Passes them through the model
3. Compares the output to the original images
4. Calculates how far off the model is (the loss)
5. Updates the model to reduce this error
The model gradually improves its ability to recreate the input images. The loss printed after each epoch shows if the model is getting better. This process is the core of how the autoencoder learns to compress and reconstruct images.
You did it! You just created your first Autoencoder! You can play around with your hyperparameters to see how they affect the loss.
I bet you’re curious to see how your model reconstructs images from the test dataset! Here it is: (if you’re curious how you can visualize your reconstructions, look at this)
Your model is performing well. The reconstructed images are somewhat blurry, but the handwritten digits are still clearly identifiable. This shows the model has learned to capture the essential features of the digits, even if it misses some finer details.
Wrapping Up
Congratulations, you’ve made it to the end of this blog post.
For a more detailed explanation of autoencoders, take a look at this notebook.
This blog post is supplementary material for Codecademy’s Machine and Deep Learning Club.
Please click on the clap button if you liked this blog post.