Pytorch: An in troduction

Pytorch: An in troduction#

This section provide essential information and techniques to utilse pytorch for machine learning approaches.

# The following command shows the CUDA version supported by the GPU driver. The GPU driver connects the GPU to the operating system.
# First, install the GPU driver. Then (optionally) install CUDA. After that, use nvidia-smi to check the CUDA version supported by the driver, 
# and finally install PyTorch.

nvidia-smi

# go to the pytorch.com to install pytorch

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 5
      1 # The following command shows the CUDA version supported by the GPU driver. The GPU driver connects the GPU to the operating system.
      2 # First, install the GPU driver. Then (optionally) install CUDA. After that, use nvidia-smi to check the CUDA version supported by the driver, 
      3 # and finally install PyTorch.
----> 5 nvidia-smi
      7 # go to the pytorch.com to install pytorch

NameError: name 'nvidia' is not defined

import torch

print(torch.__version__)
#if cuda is accessible by cuda?
print(torch.cuda.is_available())
# what is the version of cuda on GPU
print(torch.version.cuda)
# the number of devices on GPU
print(torch.cuda.device_count())
# the name of device
if torch.cuda.is_available():
    print(torch.cuda.get_device_name(0))

2.9.1
False
None
0

#pip install torch torchvision torchaudio

x_cpu = torch.rand(3,3)
print(x_cpu)

tensor([[0.5395, 0.9045, 0.6438],
        [0.5079, 0.7671, 0.5273],
        [0.0292, 0.8800, 0.8765]])

device = "cuda" if torch.cuda.is_available() else "cpu"
x_device = x_cpu.to(device)
print(x_device)

tensor([[0.5395, 0.9045, 0.6438],
        [0.5079, 0.7671, 0.5273],
        [0.0292, 0.8800, 0.8765]])

import torch

device = torch.device("mps")  # instead of "cuda"

x = torch.randn(3, 3).to(device)
print(x)

tensor([[-0.4039,  0.1722, -1.9797],
        [ 0.1560,  0.5489, -1.9454],
        [-1.2082,  0.6214,  1.4145]], device='mps:0')

print(f"Pytorch version is: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version is: {torch.version.cuda}")
print("MPS available:", torch.backends.mps.is_available())
print("MPS built:", torch.backends.mps.is_built())

Pytorch version is: 2.9.1
CUDA available: False
CUDA version is: None
MPS available: True
MPS built: True

Vector tensor, row and column tensor#

a = torch.tensor([1,2,3,4,5])
print(a)
print(f"a is a vector: {a.shape}")
print(a.dtype)

a = torch.tensor([[1,2,3,4,5]])
print(a)
print(a.shape)
print(a.dtype)

a = torch.tensor([[1],[2],[3],[4],[5]], dtype = torch.float32)
print(a)
print(a.shape)
print(a.dtype)


a = torch.tensor([[1,3,4,5,6],[2,7,5,4,3]], dtype = torch.float32)
print(a)
print(a.shape)
print(a.dtype)

tensor([1, 2, 3, 4, 5])
a is a vector: torch.Size([5])
torch.int64
tensor([[1, 2, 3, 4, 5]])
torch.Size([1, 5])
torch.int64
tensor([[1.],
        [2.],
        [3.],
        [4.],
        [5.]])
torch.Size([5, 1])
torch.float32
tensor([[1., 3., 4., 5., 6.],
        [2., 7., 5., 4., 3.]])
torch.Size([2, 5])
torch.float32

a = torch.randn(2,3,4)

print(a)
print(a.shape)
print(a.dtype)
print(torch.ones(2,3))

#Changes the shape of a tensor without changing the order of data

x = torch.arange(6)   # [0,1,2,3,4,5]
y = x.reshape(2, 3)
z = y.permute(1,0)
print(y)
print(y.shape)
print(z)
print(z.shape)

tensor([[0, 1, 2],
        [3, 4, 5]])
torch.Size([2, 3])
tensor([[0, 3],
        [1, 4],
        [2, 5]])
torch.Size([3, 2])

# Changes the order of dimensions, not the data itself
x = torch.randn(2, 3, 4)  # (batch, channel, width)

y = x.permute(1, 0, 2)

print(y.shape)

torch.Size([3, 2, 4])

# If a tensor has dimensions like 1, squeeze() removes them.

x = torch.randn(1, 3, 1, 5)
print(x.shape)
y = x.squeeze()
print(y.shape)


x = torch.tensor([1, 2, 3])   # shape: (3,)

y = x.unsqueeze(0)
print(y.shape)

torch.Size([1, 3, 1, 5])
torch.Size([3, 5])
torch.Size([1, 3])

x = torch.tensor([[1, 5, 3],
                  [4, 2, 6]])

#value, index =  torch.max(x, dim =1)
print(x.max(dim = 1))

torch.return_types.max(
values=tensor([5, 6]),
indices=tensor([1, 2]))

Dataset class in torch#

When we have a CSV file stored on a local disk, we first need to prepare the data so that it can be used as input for a neural network. This means converting the data into a format that PyTorch can understand.

To do this, we start by importing the Dataset class from torch.utils.data. Then we define our own class that inherits from Dataset. This custom class represents our dataset in a way that PyTorch can work with.

The class must implement three special (magic) methods:

init

len

getitem

The init method is responsible for loading and preparing the data. The other two methods, len and getitem, allow us to iterate over the dataset and access individual samples.

from torch.utils.data import Dataset
import pandas as pd

class TaxiDataset(Dataset):
    def __init__(self,path,transform = None):
        self.data = pd.read_csv(path)
        self.features = torch.tensor(self.data[["hour_of_day","day","row","col"]].values, dtype = torch.float32)
        self.targets = torch.tensor(self.data[["demand"]].values, dtype = torch.float32)
        self.transform = transform
    def __len__(self):
        return len(self.data)
    def __getitem__(self,idx):
        features = self.features[idx]
        targets = self.targets[idx]

        if self.transform:
            features = self.transform(features)
            
        return features,targets
def normalize_features(x):
    x[0] /= 23
    x[1] /= 365
    x[2] /= 7
    x[3] /= 7
    return x
        

dataset = TaxiDataset("../data/train_taxi.csv",transform = normalize_features)
dataset = TaxiDataset("../data/train_taxi.csv",transform = None)

print(dataset.features[10])
print(f"Dataset size is {len(dataset)}")
# getitem make an instance like dataset to be iterable like a list, this would enable us to treat with instance dataset like a list.
feature,target = dataset[10]
print(f"Sample feature {feature}")
print(f"Sample target{target}")
    

tensor([18.,  0.,  2.,  7.])
Dataset size is 26930
Sample feature tensor([18.,  0.,  2.,  7.])
Sample targettensor([3.])

Exercise: Create a custom Dataset that:

Separates features and targets Applies different transforms to each

from torch.utils.data import Dataset
import pandas as pd

class Taxi2(Dataset):
    def __init__(self,path,transformF= None, transformT = None):
        self.data = pd.read_csv(path)
        self.features = torch.tensor(self.data[["hour_of_day","day","row","col"]].values, dtype = torch.float32)
        self.targets = torch.tensor(self.data[["demand"]].values, dtype = torch.float32)
        self.transformF = transformF
        self.transformT = transformT

    def __len__(self):
        return(len(self.data))

    def __getitem__(self,idx):
        feature = self.features[idx]
        target = self.targets[idx]
        if self.transformT:
            target = self.transformT(target)
        if self.transformF:
            feature = self.transformF(feature)

        return feature, target

def normalize_features(x):
    x = x.clone()
    x[0] /= 23
    x[1] /= 365
    x[2] /= 7
    x[3] /= 7
    return x

def normalize_target(y):
    y = y.squeeze()
    y /= 100
    return(y)



dataset = Taxi2("../data/train_taxi.csv",transformF = normalize_features,transformT = normalize_target)

print(dataset.features[10])
print(f"Dataset size is {len(dataset)}")
# getitem make an instance like dataset to be iterable like a list, this would enable us to treat with instance dataset like a list.
feature,target = dataset[10]
print(f"Sample feature {feature}")
print(f"Sample target{target}")

tensor([18.,  0.,  2.,  7.])
Dataset size is 26930
Sample feature tensor([0.7826, 0.0000, 0.2857, 1.0000])
Sample target0.029999999329447746

Exercise2: Smart Dataset#

Build a custom Dataset that:

Normalizes features (once), and by row
Returns (feature, target)

from torch.utils.data import Dataset
import pandas as pd

class Taxi3(Dataset):
    def __init__(self,path, feature_norm_row = None, feature_norm_whole = None ):
        self.data = pd.read_csv(path)
        self.features = torch.tensor(self.data[["hour_of_day","day","row","col"]].values,dtype = torch.float32)
        self.target = torch.tensor(self.data[["demand"]].values, dtype = torch.float32)
        self.normrow = feature_norm_row
        self.normwhole = feature_norm_whole
        if self.normwhole:
            self.features = self.normwhole(self.features)

    def __len__(self):
        return(len(self.data))

    def __getitem__(self,idx):
        feature = self.features[idx]
        target = self.target[idx]
        if self.normrow:
            feature = self.normrow(feature)


        return(feature, target)

def normal_once(x):
    x = x.clone()
    x[:, 0] /= 23
    x[:, 1] /= 356
    x[:, 2] /= 7
    x[:, 3] /= 7
    return x

def normal_row(x):
    x = x.clone()
    x[0] /= 23
    x[1]  /= 356
    x[2]  /= 7
    x[3] /= 7
    return(x)
    
dataset3 = Taxi3("../data/train_taxi.csv",feature_norm_row = None ,feature_norm_whole = normal_once)

raw = dataset3.data.iloc[10][["hour_of_day", "day", "row", "col"]]
print("raw values:")
print(raw)

print("\nnormalized tensor:")
print(dataset3.features[10])

dataset3 = Taxi3("../data/train_taxi.csv",feature_norm_row = normal_row ,feature_norm_whole = None)

raw = dataset3.data.iloc[10][["hour_of_day", "day", "row", "col"]]
print("raw values:")
print(raw)

print("\nnormalized tensor:")
print(dataset3[10])

raw values:
hour_of_day    18
day             0
row             2
col             7
Name: 10, dtype: int64

normalized tensor:
tensor([0.7826, 0.0000, 0.2857, 1.0000])
raw values:
hour_of_day    18
day             0
row             2
col             7
Name: 10, dtype: int64

normalized tensor:
(tensor([0.7826, 0.0000, 0.2857, 1.0000]), tensor([3.]))

Note:#

Whole-dataset normalization (in init) is generally better for fixed scaling because it is faster and applied once, while row normalization (in getitem) is more flexible but less efficient.

Data loader class#

The DataLoader class allows us to load data from a dataset in batches and iterate over it efficiently during training.

It takes a dataset (usually a class derived from Dataset) and provides an iterable that returns mini-batches of data. This is useful because neural networks are typically trained on batches rather than the entire dataset at once.

In addition to batching, the DataLoader can also:

shuffle the data

load data in parallel using multiple workers

automatically collate samples into batches

Because the DataLoader is iterable, we can easily loop over the dataset during training, for example in a training loop.

from torch.utils.data import Dataset, DataLoader

batchsize = 32
# Get instance from dataloader class
dataloader = DataLoader(dataset, batch_size = batchsize, shuffle = True)
# since it is now iterable we can see its inside by using loop
for batch_idx,(batch_features,batch_targets) in enumerate(dataloader):
    print(batch_features.shape)
    print(batch_targets.shape) 
    if batch_idx == 0:
        print(batch_features)
        print(batch_targets) 
        break

torch.Size([32, 4])
torch.Size([32])
tensor([[0.5652, 0.0466, 1.0000, 0.8571],
        [0.2174, 0.0740, 0.1429, 0.5714],
        [0.4348, 0.0521, 0.0000, 0.4286],
        [0.5217, 0.0603, 0.0000, 0.7143],
        [0.7391, 0.0521, 0.4286, 0.2857],
        [0.4348, 0.0548, 0.0000, 0.0000],
        [0.2609, 0.0110, 1.0000, 0.5714],
        [0.9565, 0.0110, 0.2857, 0.0000],
        [0.0000, 0.0384, 0.4286, 0.7143],
        [0.3478, 0.0521, 1.0000, 1.0000],
        [0.9565, 0.0822, 0.5714, 0.1429],
        [0.6957, 0.0384, 0.8571, 0.0000],
        [0.6957, 0.0055, 0.5714, 0.4286],
        [0.6957, 0.0521, 0.2857, 0.8571],
        [0.3913, 0.0247, 0.0000, 0.5714],
        [0.9565, 0.0521, 0.2857, 0.1429],
        [0.1304, 0.0082, 0.5714, 0.8571],
        [0.9130, 0.0301, 1.0000, 0.7143],
        [0.5217, 0.0301, 0.5714, 0.5714],
        [0.2609, 0.0082, 0.4286, 0.4286],
        [0.9565, 0.0027, 0.5714, 0.7143],
        [0.2174, 0.0822, 0.7143, 0.1429],
        [0.6957, 0.0685, 0.2857, 0.2857],
        [0.8261, 0.0575, 1.0000, 0.4286],
        [0.2174, 0.0356, 0.8571, 0.0000],
        [0.9130, 0.0658, 0.0000, 0.5714],
        [0.3913, 0.0411, 0.4286, 0.1429],
        [0.6522, 0.0219, 0.1429, 0.2857],
        [0.1304, 0.0493, 0.5714, 0.5714],
        [0.3043, 0.0630, 0.4286, 0.5714],
        [0.3043, 0.0164, 0.0000, 0.1429],
        [0.8261, 0.0630, 0.1429, 0.1429]])
tensor([0.0018, 0.0000, 0.0045, 0.0085, 0.0020, 0.0000, 0.0004, 0.0000, 0.0046,
        0.0000, 0.0095, 0.0000, 0.0014, 0.0091, 0.0025, 0.0008, 0.0066, 0.0014,
        0.0050, 0.0117, 0.0000, 0.0013, 0.0088, 0.0000, 0.0000, 0.0061, 0.0000,
        0.0031, 0.0000, 0.0049, 0.0011, 0.0000])

First Neural Network: the simplset case#

When we design a new neural network architecture in PyTorch, we need to define a class that inherits from nn.Module. This class must implement two important methods: init and forward.

In the init method, we define the architecture of the model. This includes specifying the layers of the network, their sizes, and the activation functions used between them.

The forward method defines how the input data flows through the network. In other words, it describes the sequence of operations that transforms the input features into the final predictions. The method must be named forward because nn.Module expects this specific name.

The line

super().init()

calls the constructor of the parent class nn.Module. This is necessary so that PyTorch can properly initialize the model and register its parameters.

Since layers defined in nn.Module are callable objects, each layer instance can be used like a function when processing data inside the forward method.

Small conceptual improvement (optional sentence you could add)

When we later call the model like this:

model(x)

PyTorch internally executes the call method of nn.Module, which then calls the forward method of our model.

from torch import nn

class Taxinn(nn.Module):
    def __init__(self,input_features = 4, output_target = 1):
        super().__init__()
        self.fc1 = nn.Linear(input_features,10)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(10,10)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(10,5)
        self.relu3 = nn.ReLU()
        self.output = nn.Linear(5,output_target)
    def forward(self,features):
        # input: [bath_size, in_features]
        # output: [bath_size, out_feature]
        hidden = self.fc1(features)
        hidden = self.relu1(hidden)
        hidden = self.fc2(hidden)
        hidden= self.relu2(hidden)
        hidden = self.fc3(hidden)
        hidden = self.relu3(hidden)
        predictions= self.output(hidden)
        return predictions
        
dataloader = DataLoader(dataset, batch_size = batchsize, shuffle = True)
model = Taxinn()
print(model)
       
for name, param in model.named_parameters():
    print(name, param.shape)

y_pred = model(torch.Tensor([1,.4,.5,.7]))
print(y_pred)

Taxinn(
  (fc1): Linear(in_features=4, out_features=10, bias=True)
  (relu1): ReLU()
  (fc2): Linear(in_features=10, out_features=10, bias=True)
  (relu2): ReLU()
  (fc3): Linear(in_features=10, out_features=5, bias=True)
  (relu3): ReLU()
  (output): Linear(in_features=5, out_features=1, bias=True)
)
fc1.weight torch.Size([10, 4])
fc1.bias torch.Size([10])
fc2.weight torch.Size([10, 10])
fc2.bias torch.Size([10])
fc3.weight torch.Size([5, 10])
fc3.bias torch.Size([5])
output.weight torch.Size([1, 5])
output.bias torch.Size([1])
tensor([-0.4320], grad_fn=<ViewBackward0>)

Sequential in Neural Network#

PyTorch provides a convenient class called nn.Sequential that allows us to group a sequence of layers into a single module. This is useful when the architecture of the neural network is simply a chain of layers where the output of one layer becomes the input of the next.

In the code above, the layers are arranged in a straightforward feed-forward structure. Each layer processes the output of the previous one, forming a linear pipeline of operations. The network consists of several fully connected (Linear) layers with ReLU activation functions placed between them.

Instead of defining each layer separately and manually passing the data through them in the forward method, we place all layers inside an nn.Sequential container:

self.ffblock = nn.Sequential(…)

This container automatically applies the layers in the order they are defined. As a result, the output of each layer is passed as the input to the next layer.

Because of this, the forward method becomes very simple. We only need to pass the input features through the sequential block:

predictions = self.ffblock(features) return predictions

The sequential block internally performs all intermediate computations, applying each layer one after another.

The line

super().init()

initializes the parent class nn.Module, which allows PyTorch to correctly register the model’s parameters and manage them during training.

After defining the model, we create an instance of the class:

model = Taxinn()

Printing the model shows the architecture of the network. We can also inspect all learnable parameters using:

for name, param in model.named_parameters(): print(name, param.shape)

Finally, we test the model by passing a sample input tensor to it. When we call: