Now you know how to speed up data processing with Numba. Today we will dig deeper into vectorization, window calculations and a real multithreading in Python.
Vectorizing windows
So, you can replace looping over a vector with a single vector operation which is over 200 times faster. But what if you need some sliding calculations like a moving average?
For simple calculations you can use some algorithmic trick if you can find one. But generally speaking you will eventually end up with looping of some kind, so you should be able to make it fast. It is where numba.guvectorize comes into play.
Now compare the performance:
And numba wins again with over 1.8 times speed increase! Take into account that numpy uses here very well optimized C-functions but a pretty simple code with a numba decorator is still faster.
Unfortunately, there are just a few functions like cumsum() which can help you avoid iteration over a numpy array. So when you need more sophisticated calculations over sliding windows (and you will definitely need them sooner or later) you may just use numba.guvectorize to make your code clean and fast at the same time. Thus you could achieve performance improvement by 1-2 orders of magnitude.
Multithreaded Python, really?
Due to Global Interpreter Lock (GIL) only one thread is active in python programs at any given time. As a result, you can’t get the real parallelization. This problem may be partially solved with gevent if your code is I/O-constrained. However, for computationally intensive programs you had to resort to multiprocessing. Though processes are heavy-weight and require more complicated data manipulations and sophisticated interprocess data exchange.
Hopefully, you can unlock GIL and take advantage of fully functional threads in Python. All you need is add nogil option to jit decorator.
Did it bring a noticable perfromance improvement? Yes, it did.
Multithreading gives a 5 times speed up. Thus you don’t have to write any C-extensions anymore to achieve a real parallelization with threads.
P.S. This post is not a replacement for Numba documentation. Please, read it carefully as there are a few constraints and important notes which might influence your code and even your program design.