NumPy#
NumPy is one of the most widely used libraries for scientific computing in Python. Even if you donโt use it directly, many other libraries rely on it behind the scenes. It offers fast multidimensional arrays along with tools to work with them efficiently.
Python Lists vs NumPy Arrays#
Python lists are flexible, they can store different types of data in the same container. For example:
import numpy as np
example_list = [1, 1.5, "abc", False, [1.5, True],[2, "python"],[3, [False, "python"]]]
example_list
[1, 1.5, 'abc', False, [1.5, True], [2, 'python'], [3, [False, 'python']]]
This flexibility, however, causes limitations. If we try applying an operation like โadd 1โ to every element, we must process each item one by one. Adding 1 works for numbers, but not for strings, booleans, or nested lists.
In scientific computing we often work with huge collections of numerical values, and we need fast operations on all elements at once. This is where NumPy arrays excel. What is a NumPy Array?
A NumPy array is a grid of values, but unlike lists, all elements have the same data type. Arrays are stored efficiently in memory and allow vectorized operations, meaning operations apply to entire groups of values at once.
A NumPy array has three key attributes:
dtype โ data type. Arrays always contain one type (arrays are homogeneous).
shape โ Dimensions of the array, e.g.
(3,2),(3,4,500), or().data โ raw data storage in memory. This can be passed to C or Fortran code for efficient calculations.
Performance Test: Python vs NumPy#
To compare performance of list in pure python with NumPy, lets see the follwoing code. At first we make a list of numbers from 0 to 9999 (a) and a list with zero values (b).
a = list(range(100000))
b = [ 0 ] * 100000
The follwoing cell fill the list b with quare values of a and the magic code timeit calculate the running time.
%%timeit
for i in range(len(a)):
b[i] = a[i]**2
30.1 ms ยฑ 48.9 ฮผs per loop (mean ยฑ std. dev. of 7 runs, 10 loops each)
Now, lets compare with the Numpy. Create an array
import numpy as np
a = np.arange(10000)
and then calculate the queare of each entity.
%%timeit
b = a ** 2
4.17 ฮผs ยฑ 3.01 ns per loop (mean ยฑ std. dev. of 7 runs, 100,000 loops each)
We see that compared to working with numpy arrays, working with traditional python lists is actually slow.
๐งฎ NumPy Fundamentals#
What is a NumPy array?#
A NumPy array (an ndarray) is an N-dimensional, homogeneous, fixed-type container for data.
Arrays are stored in contiguous memory, which enables fast vectorized operations. :contentReference[oaicite:14]{index=14}
Creating arrays (some common methods)#
numpy.array()โ from Python lists/tuplesnumpy.zeros(shape)โ all zerosnumpy.ones(shape)โ all onesnumpy.full(shape, fill_value)โ filled with a constantnumpy.eye(n)โ identity matrix (2D)numpy.arange(start, stop, step)โ range of values (1D)numpy.linspace(start, stop, num)โ evenly spaced values between start and stop (1D)Many more:
random,empty,arange, etc. :contentReference[oaicite:15]{index=15}
Key array attributes & metadata#
Given an array a, you commonly use:
a.ndimโ number of dimensions (axes)a.shapeโ tuple of sizes for each dimension, e.g.(3,4)for a 3ร4 matrixa.sizeโ total number of elements (product of shape entries)a.dtypeโ data type of the elements (all elements have same dtype) :contentReference[oaicite:16]{index=16}
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print(a.ndim, a.shape, a.size, a.dtype)
2 (2, 3) 6 int64
np.zeros((2, 3)) # 2x3 array with all elements 0
np.ones((1,2)) # 1x2 array with all elements 1
np.full((2,2),7) # 2x2 array with all elements 7
np.eye(2) # 2x2 identity matrix
np.arange(10) # Evenly spaced values in an interval
np.linspace(0,9,10) # same as above, see exercise
c = np.ones((3,3))
d = np.ones((3, 2), 'bool') # 3x2 boolean array
Arrays can also be stored to and loaded from a .npy file using:
numpy.save()numpy.load()
np.save('x.npy', a) # save the array a to a .npy file
x = np.load('x.npy') # load an array from a .npy file and store it in variable x
In many occasions (especially when something behaves differently than expected), it is useful to check and control the data type of an array using:
numpy.ndarray.dtypenumpy.ndarray.astype()
d.dtype # datatype of the array
dtype('bool')
d.astype('int') # change datatype from boolean to integer
array([[1, 1],
[1, 1],
[1, 1]])
In the last example, using .astype('int'), NumPy will make a copy of the array and re-allocate memory โ unless the target dtype is identical to the original.
Understanding and minimizing copies is one of the most important practices for performance.
๐ฉ Exercises: NumPy-1
Datatypes โ Try
np.arange(10)andnp.linspace(0,9,10).
What is the difference? Can you make one behave like the other?Datatypes โ Create a 3ร2 array of floats (
numpy.random.random())
and convert it to integers using.astype(int). How does it change?Reshape โ Create a 3ร2 integer array (range 0โ10) and change shape
using.reshape(). Which shapes are not allowed?NumPy I/O โ Save the array using
numpy.save()and load it back withnumpy.load().
๐ Click to show solutions
# 1. Difference between arange & linspace
np.arange(10) # โ integers, step 1
np.linspace(0,9,10) # โ 10 numbers evenly spaced between 0 and 9
# 2. Create random array & convert type
arr = np.random.random((3,2))
arr_int = arr.astype(int)
# 3. Reshape example
x = np.random.randint(0,10,(3,2))
x.reshape(6,) # works
# x.reshape(4,2) # โ fails โ size mismatch
# 4. Save & load
np.save("data.npy", x)
loaded = np.load("data.npy")
๐ฅ Copying in NumPy#
Understanding when NumPy copies data and when it returns a view is crucial for performance.
A copy duplicates memory (slower, more RAM).
A view shares memory (faster, no duplication).
1๏ธโฃ Copy โ new memory allocated#
A copy creates an independent array.
Changing one does NOT affect the other
More memory used
Slower than a view
import numpy as np
a = np.array([1,2,3,4])
b = a.copy() # full copy
b[0] = 999
print("a =", a)
print("b =", b) # different โ independent arrays
a = [1 2 3 4]
b = [999 2 3 4]
2๏ธโฃ View โ NO new memory allocated#
A view shares memory with the original array.
Very fast
No extra memory
โ Modifying one changes the other
c = a.view() # view โ shares memory
c[0] = 555
print("a =", a) # changed!
print("c =", c)
a = [555 2 3 4]
c = [555 2 3 4]
3๏ธโฃ astype() and Copying#
Converting dtype usually forces a copy, because data must be rewritten.
Operation |
Copy? |
|---|---|
|
โ Yes |
|
โ Yes |
|
โ No Copy |
d = np.array([True, False, True])
d2 = d.astype(int) # bool โ int โ copy
print("Original:", d)
print("Converted:", d2)
print("Same object?", d2 is d)
Original: [ True False True]
Converted: [1 0 1]
Same object? False
4๏ธโฃ Performance: Same type vs Type conversion#
Converting to the same dtype is fast (no copy).
Changing dtype is slower (copy + new memory).
import timeit
arr = np.random.rand(5_000_000).astype('float64')
print("Same dtype :", timeit.timeit("arr.astype('float64')", globals=globals(), number=5))
print("New dtype :", timeit.timeit("arr.astype('int64')", globals=globals(), number=5))
Same dtype : 0.015976292022969574
New dtype : 0.021505624987185
๐ง Summary: When does NumPy copy?#
Operation |
View or Copy? |
|---|---|
|
๐ฅ Always a copy |
Slicing |
๐ View (no copy) |
|
๐ฅ Copy |
|
โก No copy |
|
Usually view (but sometimes copy) |
Fancy indexing |
โ Copy |
Efficient NumPy == avoiding unnecessary copies ๐
Accessing and modifying data (indexing and slicing)#
Once an array is created, we often need to read or change parts of it. NumPy supports powerful ways to access and modify elements:
Basic indexing (single elements, rows, columns)
Slicing (subarrays using
start:stop:step)Boolean masking (select elements matching a condition)
Fancy indexing (selecting with index lists or arrays)
In-place modification (changing values without creating a new array)
Basic indexing (1D and 2D)#
For a 1D array, indexing works like Python lists:
a[0]โ first elementa[-1]โ last element
For a 2D array (matrix):
a[i, j]โ element in rowi, columnja[i]โ thei-th row
Slicing#
Slicing uses the start:stop:step syntax:
a[1:4]โ elements with indices 1, 2, 3a[:3]โ first three elementsa[::2]โ every second element
In 2D:
a[1:, :]โ all rows from index 1, all columnsa[:, 1:3]โ all rows, columns 1 and 2
Slices in NumPy are usually views, not copies:
they share memory with the original array.
Boolean masking#
We can build an array of booleans based on a condition and use it to filter:
mask = (a > 0)a[mask]โ elements where the condition isTrue
This is very useful for selecting or modifying values that satisfy some condition.
Fancy indexing#
Fancy indexing uses integer arrays or lists as indices:
a[[0, 2, 4]]โ elements at positions 0, 2, and 4Can be used along multiple axes in multi-dimensional arrays.
Fancy indexing returns a copy, not a view.
Modifying arrays in-place#
All of these indexing methods can be used on the left-hand side of an assignment:
a[0] = 10a[1:4] = 0a[a < 0] = 0
This changes the original array in-place (no new array is created).
import numpy as np
# 1D array
a = np.arange(10)
print("a =", a)
print("a[0] =", a[0]) # first element
print("a[-1] =", a[-1]) # last element
print("a[2:7] =", a[2:7]) # elements with indices 2..6
print("a[:5] =", a[:5]) # first 5 elements
print("a[::2] =", a[::2]) # step of 2 (even indices)
a = [0 1 2 3 4 5 6 7 8 9]
a[0] = 0
a[-1] = 9
a[2:7] = [2 3 4 5 6]
a[:5] = [0 1 2 3 4]
a[::2] = [0 2 4 6 8]
b = np.arange(12).reshape(3, 4)
print("b =\n", b)
print("b[0, 0] =", b[0, 0]) # element in first row, first column
print("b[1] =", b[1]) # second row
print("b[:, 1] =", b[:, 1]) # all rows, second column
print("b[1:, 2:] =\n", b[1:, 2:]) # subarray: rows 1..end, cols 2..end
b =
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
b[0, 0] = 0
b[1] = [4 5 6 7]
b[:, 1] = [1 5 9]
b[1:, 2:] =
[[ 6 7]
[10 11]]
c = np.array([3, -1, 0, 7, -5, 2])
print("c =", c)
mask = (c > 0)
print("mask =", mask)
print("c[mask] =", c[mask]) # all positive elements
# modify in-place: set all negative values to 0
c[c < 0] = 0
print("c after c[c < 0] = 0 โ", c)
c = [ 3 -1 0 7 -5 2]
mask = [ True False False True False True]
c[mask] = [3 7 2]
c after c[c < 0] = 0 โ [3 0 0 7 0 2]
d = np.linspace(0, 1, 6)
print("d =", d)
indices = [0, 2, 4]
print("selected =", d[indices]) # elements at indices 0, 2, 4
# fancy indexing returns a copy, so modifying 'sel' won't change 'd'
sel = d[indices]
sel[0] = 999
print("sel =", sel)
print("d (unchanged) =", d)
d = [0. 0.2 0.4 0.6 0.8 1. ]
selected = [0. 0.4 0.8]
sel = [9.99e+02 4.00e-01 8.00e-01]
d (unchanged) = [0. 0.2 0.4 0.6 0.8 1. ]
e = np.arange(10)
print("e =", e)
# set a slice to a single value
e[2:5] = 99
print("e after e[2:5] = 99 โ", e)
# reset
e = np.arange(10)
# multiply a slice
e[1:6] *= 10
print("e after e[1:6] *= 10 โ", e)
# use boolean mask to change only even numbers
mask_even = (e % 2 == 0)
e[mask_even] = -1
print("e after e[mask_even] = -1 โ", e)
e = [0 1 2 3 4 5 6 7 8 9]
e after e[2:5] = 99 โ [ 0 1 99 99 99 5 6 7 8 9]
e after e[1:6] *= 10 โ [ 0 10 20 30 40 50 6 7 8 9]
e after e[mask_even] = -1 โ [-1 -1 -1 -1 -1 -1 -1 7 -1 9]