Multilayer Perceptron (MLP) with PyTorch on MNIST#
After implementing an MLP from scratch, it is useful to reproduce the same model using PyTorch. This gives you (1) a correctness check against a widely used framework and (2) a baseline for future experiments (regularization, better optimizers, GPUs, etc.). This notebook demonstrates how to train a Multilayer Perceptron (MLP) using PyTorch on the MNIST dataset.
Prerequisites#
Install the required packages:
# pip install torch torchvision
1. Imports and Device Setup#
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
device(type='cpu')
2. Load the MNIST Dataset with torchvision#
MNIST images are 28×28 grayscale. For an MLP, we flatten each image into a 784-dimensional vector.
We will use datasets from torchvision to load the MNIST handwritten digits dataset. You can find the list of datasets available on torchvision here. Now let’s take a loot at the parameters we set:
rootsets the directory we store and load our data from.trainindicates wether we want the training dataset or the test dataset.transformallows us to apply transformations to our data, here we are only going to convert the data to tensor so that they work with PyToch, however in the future notebooks you will see more complicated transformations.
transform = transforms.Compose([
transforms.ToTensor()
])
train_dataset = datasets.MNIST(
root='data', train=True, download=True, transform=transform
)
test_dataset = datasets.MNIST(
root='data', train=False, download=True, transform=transform
)
print(f"Training data: {train_dataset}\n")
print(f"Test data: {test_dataset}")
Training data: Dataset MNIST
Number of datapoints: 60000
Root location: data
Split: Train
StandardTransform
Transform: Compose(
ToTensor()
)
Test data: Dataset MNIST
Number of datapoints: 10000
Root location: data
Split: Test
StandardTransform
Transform: Compose(
ToTensor()
)
Data Loaders#
To make loading and working with the data easier, we are going to use DataLoader from torch.utils.data. The DataLoader takes in a dataset and a batch_size parameter, and allows us to iterate over the dataset. Here we do one iteration just to see the data shapes:
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=256, shuffle=False)
3. Define the MLP Model#
This is a standard fully connected network: 784 → hidden → hidden → 10. We do not apply softmax inside the model because CrossEntropyLoss expects raw logits.
class MLP(nn.Module):
def __init__(self, input_dim=28*28, hidden1=256, hidden2=128, num_classes=10):
super().__init__()
self.net = nn.Sequential(
nn.Linear(input_dim, hidden1),
nn.ReLU(),
nn.Linear(hidden1, hidden2),
nn.ReLU(),
nn.Linear(hidden2, num_classes)
)
def forward(self, x):
x = x.view(x.size(0), -1)
return self.net(x)
model = MLP().to(device)
model
MLP(
(net): Sequential(
(0): Linear(in_features=784, out_features=256, bias=True)
(1): ReLU()
(2): Linear(in_features=256, out_features=128, bias=True)
(3): ReLU()
(4): Linear(in_features=128, out_features=10, bias=True)
)
)
4. Loss Function and Optimizer#
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
5. Training and Evaluation Functions#
def train_one_epoch(model, loader, criterion, optimizer, device):
model.train()
total_loss, correct, total = 0.0, 0, 0
for x, y in loader:
x, y = x.to(device), y.to(device)
optimizer.zero_grad()
logits = model(x)
loss = criterion(logits, y)
loss.backward()
optimizer.step()
total_loss += loss.item() * x.size(0)
correct += (logits.argmax(1) == y).sum().item()
total += y.size(0)
return total_loss / total, correct / total
@torch.no_grad()
def evaluate(model, loader, criterion, device):
model.eval()
total_loss, correct, total = 0.0, 0, 0
for x, y in loader:
x, y = x.to(device), y.to(device)
logits = model(x)
loss = criterion(logits, y)
total_loss += loss.item() * x.size(0)
correct += (logits.argmax(1) == y).sum().item()
total += y.size(0)
return total_loss / total, correct / total
6. Train the Model#
epochs = 5
for epoch in range(1, epochs + 1):
train_loss, train_acc = train_one_epoch(model, train_loader, criterion, optimizer, device)
test_loss, test_acc = evaluate(model, test_loader, criterion, device)
print(f'Epoch {epoch:02d} | Train Acc: {train_acc:.4f} | Test Acc: {test_acc:.4f}')
Epoch 01 | Train Acc: 0.9032 | Test Acc: 0.9498
Epoch 02 | Train Acc: 0.9601 | Test Acc: 0.9679
Epoch 03 | Train Acc: 0.9733 | Test Acc: 0.9736
Epoch 04 | Train Acc: 0.9802 | Test Acc: 0.9751
Epoch 05 | Train Acc: 0.9853 | Test Acc: 0.9745