NumPy

NumPy#

NumPy is one of the most widely used libraries for scientific computing in Python. Even if you don’t use it directly, many other libraries rely on it behind the scenes. It offers fast multidimensional arrays along with tools to work with them efficiently.

Python Lists vs NumPy Arrays#

Python lists are flexible, they can store different types of data in the same container. For example:

import numpy as np
example_list = [1, 1.5, "abc", False, [1.5, True],[2, "python"],[3, [False, "python"]]]
example_list

[1, 1.5, 'abc', False, [1.5, True], [2, 'python'], [3, [False, 'python']]]

This flexibility, however, causes limitations. If we try applying an operation like “add 1” to every element, we must process each item one by one. Adding 1 works for numbers, but not for strings, booleans, or nested lists.

In scientific computing we often work with huge collections of numerical values, and we need fast operations on all elements at once. This is where NumPy arrays excel. What is a NumPy Array?

A NumPy array is a grid of values, but unlike lists, all elements have the same data type. Arrays are stored efficiently in memory and allow vectorized operations, meaning operations apply to entire groups of values at once.

A NumPy array has three key attributes:

dtype — data type. Arrays always contain one type (arrays are homogeneous).
shape — Dimensions of the array, e.g. (3,2), (3,4,500), or ().
data — raw data storage in memory. This can be passed to C or Fortran code for efficient calculations.

Performance Test: Python vs NumPy#

To compare performance of list in pure python with NumPy, lets see the follwoing code. At first we make a list of numbers from 0 to 9999 (a) and a list with zero values (b).

a = list(range(100000))
b = [ 0 ] * 100000

The follwoing cell fill the list b with quare values of a and the magic code timeit calculate the running time.

%%timeit
for i in range(len(a)):
  b[i] = a[i]**2

30.1 ms ± 48.9 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Now, lets compare with the Numpy. Create an array

import numpy as np
a = np.arange(10000)

and then calculate the queare of each entity.

%%timeit
b = a ** 2

4.17 μs ± 3.01 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

We see that compared to working with numpy arrays, working with traditional python lists is actually slow.

🧮 NumPy Fundamentals#

What is a NumPy array?#

A NumPy array (an ndarray) is an N-dimensional, homogeneous, fixed-type container for data.
Arrays are stored in contiguous memory, which enables fast vectorized operations. :contentReference[oaicite:14]{index=14}

Creating arrays (some common methods)#

numpy.array() — from Python lists/tuples
numpy.zeros(shape) — all zeros
numpy.ones(shape) — all ones
numpy.full(shape, fill_value) — filled with a constant
numpy.eye(n) — identity matrix (2D)
numpy.arange(start, stop, step) — range of values (1D)
numpy.linspace(start, stop, num) — evenly spaced values between start and stop (1D)
Many more: random, empty, arange, etc. :contentReference[oaicite:15]{index=15}

Key array attributes & metadata#

Given an array a, you commonly use:

a.ndim — number of dimensions (axes)
a.shape — tuple of sizes for each dimension, e.g. (3,4) for a 3×4 matrix
a.size — total number of elements (product of shape entries)
a.dtype — data type of the elements (all elements have same dtype) :contentReference[oaicite:16]{index=16}

import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a.ndim, a.shape, a.size, a.dtype)

2 (2, 3) 6 int64

np.zeros((2, 3))             # 2x3 array with all elements 0
np.ones((1,2))               # 1x2 array with all elements 1
np.full((2,2),7)             # 2x2 array with all elements 7
np.eye(2)                    # 2x2 identity matrix

np.arange(10)                # Evenly spaced values in an interval
np.linspace(0,9,10)          # same as above, see exercise

c = np.ones((3,3))
d = np.ones((3, 2), 'bool')  # 3x2 boolean array

Arrays can also be stored to and loaded from a .npy file using:

numpy.save()
numpy.load()

np.save('x.npy', a)           # save the array a to a .npy file
x = np.load('x.npy')          # load an array from a .npy file and store it in variable x

In many occasions (especially when something behaves differently than expected), it is useful to check and control the data type of an array using:

numpy.ndarray.dtype
numpy.ndarray.astype()

d.dtype                    # datatype of the array

dtype('bool')

d.astype('int')            # change datatype from boolean to integer

array([[1, 1],
       [1, 1],
       [1, 1]])

In the last example, using .astype('int'), NumPy will make a copy of the array and re-allocate memory — unless the target dtype is identical to the original.
Understanding and minimizing copies is one of the most important practices for performance.

🟩 Exercises: NumPy-1

Datatypes — Try np.arange(10) and np.linspace(0,9,10).
What is the difference? Can you make one behave like the other?
Datatypes — Create a 3×2 array of floats (numpy.random.random())
and convert it to integers using .astype(int). How does it change?
Reshape — Create a 3×2 integer array (range 0–10) and change shape
using .reshape(). Which shapes are not allowed?
NumPy I/O — Save the array using numpy.save() and load it back with numpy.load().

👇 Click to show solutions

# 1. Difference between arange & linspace
np.arange(10)         # → integers, step 1
np.linspace(0,9,10)   # → 10 numbers evenly spaced between 0 and 9

# 2. Create random array & convert type
arr = np.random.random((3,2))
arr_int = arr.astype(int)

# 3. Reshape example
x = np.random.randint(0,10,(3,2))
x.reshape(6,)   # works
# x.reshape(4,2) # ❌ fails — size mismatch

# 4. Save & load
np.save("data.npy", x)
loaded = np.load("data.npy")

🔥 Copying in NumPy#

Understanding when NumPy copies data and when it returns a view is crucial for performance.
A copy duplicates memory (slower, more RAM).
A view shares memory (faster, no duplication).

1️⃣ Copy — new memory allocated#

A copy creates an independent array.

Changing one does NOT affect the other
More memory used
Slower than a view

import numpy as np

a = np.array([1,2,3,4])
b = a.copy()         # full copy

b[0] = 999

print("a =", a)  
print("b =", b)      # different → independent arrays

a = [1 2 3 4]
b = [999   2   3   4]

2️⃣ View — NO new memory allocated#

A view shares memory with the original array.

Very fast
No extra memory
⚠ Modifying one changes the other

c = a.view()         # view — shares memory
c[0] = 555

print("a =", a)      # changed!
print("c =", c)

a = [555   2   3   4]
c = [555   2   3   4]

3️⃣ `astype()` and Copying#

Converting dtype usually forces a copy, because data must be rewritten.

Operation	Copy?
`astype(int)` on bool	✔ Yes
`astype(float)` on int	✔ Yes
`astype(same_type)`	❗ No Copy

d = np.array([True, False, True])

d2 = d.astype(int)     # bool → int → copy
print("Original:", d)
print("Converted:", d2)
print("Same object?", d2 is d)

Original: [ True False  True]
Converted: [1 0 1]
Same object? False

4️⃣ Performance: Same type vs Type conversion#

Converting to the same dtype is fast (no copy).
Changing dtype is slower (copy + new memory).

import timeit

arr = np.random.rand(5_000_000).astype('float64')

print("Same dtype :", timeit.timeit("arr.astype('float64')", globals=globals(), number=5))
print("New dtype  :", timeit.timeit("arr.astype('int64')",   globals=globals(), number=5))

Same dtype : 0.015976292022969574
New dtype  : 0.021505624987185

🧠 Summary: When does NumPy copy?#

Operation	View or Copy?
`array.copy()`	🔥 Always a copy
Slicing `arr[1:5]`	🔍 View (no copy)
`astype(new_dtype)`	🔥 Copy
`astype(same_dtype)`	⚡ No copy
`.reshape()`	Usually view (but sometimes copy)
Fancy indexing `arr[[1,5,7]]`	❗ Copy

Efficient NumPy == avoiding unnecessary copies 🚀

Accessing and modifying data (indexing and slicing)#

Once an array is created, we often need to read or change parts of it. NumPy supports powerful ways to access and modify elements:

Basic indexing (single elements, rows, columns)
Slicing (subarrays using start:stop:step)
Boolean masking (select elements matching a condition)
Fancy indexing (selecting with index lists or arrays)
In-place modification (changing values without creating a new array)

Basic indexing (1D and 2D)#

For a 1D array, indexing works like Python lists:

a[0] — first element
a[-1] — last element

For a 2D array (matrix):

a[i, j] — element in row i, column j
a[i] — the i-th row

Slicing#

Slicing uses the start:stop:step syntax:

a[1:4] — elements with indices 1, 2, 3
a[:3] — first three elements
a[::2] — every second element

In 2D:

a[1:, :] — all rows from index 1, all columns
a[:, 1:3] — all rows, columns 1 and 2

Slices in NumPy are usually views, not copies:
they share memory with the original array.

Boolean masking#

We can build an array of booleans based on a condition and use it to filter:

mask = (a > 0)
a[mask] — elements where the condition is True

This is very useful for selecting or modifying values that satisfy some condition.

Fancy indexing#

Fancy indexing uses integer arrays or lists as indices:

a[[0, 2, 4]] — elements at positions 0, 2, and 4
Can be used along multiple axes in multi-dimensional arrays.

Fancy indexing returns a copy, not a view.

Modifying arrays in-place#

All of these indexing methods can be used on the left-hand side of an assignment:

a[0] = 10
a[1:4] = 0
a[a < 0] = 0

This changes the original array in-place (no new array is created).

import numpy as np

# 1D array
a = np.arange(10)
print("a =", a)

print("a[0]      =", a[0])      # first element
print("a[-1]     =", a[-1])     # last element
print("a[2:7]    =", a[2:7])    # elements with indices 2..6
print("a[:5]     =", a[:5])     # first 5 elements
print("a[::2]    =", a[::2])    # step of 2 (even indices)

a = [0 1 2 3 4 5 6 7 8 9]
a[0]      = 0
a[-1]     = 9
a[2:7]    = [2 3 4 5 6]
a[:5]     = [0 1 2 3 4]
a[::2]    = [0 2 4 6 8]

b = np.arange(12).reshape(3, 4)
print("b =\n", b)

print("b[0, 0]   =", b[0, 0])    # element in first row, first column
print("b[1]      =", b[1])       # second row
print("b[:, 1]   =", b[:, 1])    # all rows, second column
print("b[1:, 2:] =\n", b[1:, 2:])  # subarray: rows 1..end, cols 2..end

b =
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
b[0, 0]   = 0
b[1]      = [4 5 6 7]
b[:, 1]   = [1 5 9]
b[1:, 2:] =
 [[ 6  7]
 [10 11]]

c = np.array([3, -1, 0, 7, -5, 2])
print("c =", c)

mask = (c > 0)
print("mask       =", mask)
print("c[mask]    =", c[mask])   # all positive elements

# modify in-place: set all negative values to 0
c[c < 0] = 0
print("c after c[c < 0] = 0 →", c)

c = [ 3 -1  0  7 -5  2]
mask       = [ True False False  True False  True]
c[mask]    = [3 7 2]
c after c[c < 0] = 0 → [3 0 0 7 0 2]

d = np.linspace(0, 1, 6)
print("d =", d)

indices = [0, 2, 4]
print("selected =", d[indices])    # elements at indices 0, 2, 4

# fancy indexing returns a copy, so modifying 'sel' won't change 'd'
sel = d[indices]
sel[0] = 999
print("sel        =", sel)
print("d (unchanged) =", d)

d = [0.  0.2 0.4 0.6 0.8 1. ]
selected = [0.  0.4 0.8]
sel        = [9.99e+02 4.00e-01 8.00e-01]
d (unchanged) = [0.  0.2 0.4 0.6 0.8 1. ]

e = np.arange(10)
print("e =", e)

# set a slice to a single value
e[2:5] = 99
print("e after e[2:5] = 99 →", e)

# reset
e = np.arange(10)

# multiply a slice
e[1:6] *= 10
print("e after e[1:6] *= 10 →", e)

# use boolean mask to change only even numbers
mask_even = (e % 2 == 0)
e[mask_even] = -1
print("e after e[mask_even] = -1 →", e)

e = [0 1 2 3 4 5 6 7 8 9]
e after e[2:5] = 99 → [ 0  1 99 99 99  5  6  7  8  9]
e after e[1:6] *= 10 → [ 0 10 20 30 40 50  6  7  8  9]
e after e[mask_even] = -1 → [-1 -1 -1 -1 -1 -1 -1  7 -1  9]