Lightning fast Python with Numba

Reading time ~7 minutes

Even if you have numpy which uses all available cores of your processor your programs can still be pretty slow. This is especially the case if you process massive datasets, performing a lot of data transformations and calculations.

Switch to vector operations

The very first hint would be to quit pythonic-way of programming and get rid of list comprehensions and all other loops as these make your programs as fast as snails.

Instead, use numpy’s vector operations which bring you a significant performance improvement. For instance, you have a simple vector each element of which contains a timestamp and need to know all time intervals. Let’s write a simple code containing both a list comprehension and a vector operation:

import numpy as np
# x contains a timeseries; fill it with random data for a test
x = np.random.random_int(1e7)
# this is a classic pythonic list comprehension
intervals = [x[i] - x[i-1] for i in range(1, len(x))]
# and a vector subtraction which gives the very same result
intervals = x[1:] - x[:-1]

But compare the performance:

list comprehension: 5642 ms
vector operation: 32 ms

This is an enormous speed up - 176 times faster. Those who have mastered vector operations and never write loops and list comprehensions anymore believe that this is the fastest way of data processing. And they can’t be more wrong…

Numba makes it even faster

To put it simply, Numba is a just-in-time compiler that makes your Python code as fast as a code written in C. All you need is add a decorator to your usual python function (you have to comply to certain constraints, though, as not any code can be made fast). Let me demonstrate it on the simplest example:

import numpy as np
from numba import jit

def python_f(x):
  s = 0
  for i in range(len(x)):
    s += x[i]
  return s

def numpy_f(x):
  return x.sum()

@jit
def numba_f(x):
  s = 0
  for i in range(len(x)):
    s += x[i]
  return s

# data
x = np.random.random_int(3e7)
# try them all
python_f(x)
numpy_f(x)
numba_f(x)

And now execution times:

python: 8641 ms
numpy: 52 ms
numba (first call): 116 ms
numba (next calls): 35 ms

The first call of a numba function includes compilation, that is why it takes longer. But when compiled it is 1.5 times faster than highly optimized numpy code. This is just unbelievable! The simplest summation gives you an immensive performance increase. But this is not all numba can do for you.

Vectorize everything

While you can perform a vector-wide operations and data transformations, numpy is great. However, eventually you bump into a situation when you need some element-wise logic… and numpy just stops delivering here.

import numpy as np
from numba import vectorize

def python_f(x,y):
  if x > y:
     m = x
  else:
     m = y
  return m

@vectorize
def numba_f(x,y):
  if x > y:
     m = x
  else:
     m = y
  return m

# your data, 10 million items
size = 1e7
x = np.random.randint(0, 20, size=size)
y = np.random.randint(0, 20, size=size)

# pythonic way of element-wise operations
[python_f(x[i], y[i]) for i in range(len(x))]
# numpy has its own vectorize
numpy_f = np.vectorize(python_f)
numpy_f(x, y)
# but numba's vectorize is what we're after
numba_f(x, y)

You might be eager to see the performance, don’t you? Here it is:

python: 6657 ms
numpy: 3068 ms
numba (first call): 70 ms
numba (next calls): 24 ms

Now you see it - numba is 127 times faster than numpy.

Show your types

As you could notice, the first call to numba functions takes much longer as it includes a compilation phase. Fortunately, you can avoid this with a so-called eager, or decoration time, compilation when you pass type signatures to the decorator.

@vectorize([int32(int32, int32)])
def numba_f(x,y):
  if x > y:
     m = x
  else:
     m = y
  return m

You can pass several signatures if your work with different data types in the same function:

@vectorize([int32(int32, int32), 
            int64(int64, int64), 
            float32(float32, float32), 
            float64(float64, float64)])

Now your first call will be fast as well, since compilation will be performed in advance. However, if you call this function with data of some other types you will get a run-time error. So be cautious and think carefully about your data and how to process it.