Timing Python code and speeding things up with Numba

For many small coding tasks, we use the computer to analyze a small amount of data, or we use it to simulate a simple system and speed is not really an issue. However, there are certainly many circumstances where speed is important. This post will show

1. how to use python's *timeit* module to time code snippets and functions
2. how to use numba to speed up code (significantly---often by more than 10x)

Using timeit to test small code snippets

Python has a nice module that can be used to time code (both from a bash terminal and from within a Jupyter Notebook or ipython console.) You can read about the timeit class at the Python 3.6 Documentation page for the timeit module; here I will confine myself to the timeit.timeit() function. The Python 3.6 Documentation defines this function as

timeit.timeit(stmt='pass', setup='pass', timer=, number=1000000, globals=None)

Create a Timer instance with the given statement, setup code and timer function and run its timeit() method with number executions. The optional globals argument specifies a namespace in which to execute the code.

The timeit.timeit() function returns a the time needed in units of seconds. Be careful when executing this function as it defaults to a million executions, and this could be prohibitively long for a complex function.

Let's start with a simple example and get progressively fancier.

Timing a simple statement

Here's how to use the function timeit.timeit() to time the execution time of numpy's $\sin$ function; specifically, I'll time how long in seconds it takes numpy to execute the $\sin(\pi/2)$ for number=10,000 times.

In [1]:
import math as m 
import numpy as np
import timeit 
executionTimeNP = timeit.timeit(stmt = 'np.sin(np.pi/2)',
                                   setup = 'import numpy as np',
                                   number = 10000)
print('time in milli-seconds = ',1000*executionTimeNP)
time in milli-seconds =  6.780611001886427

Suppose you wanted to see how the exectution speed of numpy compares to the python math library's $\sin(x)$ function; this is easy to test---the numpy library is not as fast as the math library for repeated evaluations of $\sin(x)$:

In [2]:
executionTimeMath = timeit.timeit(stmt = 'm.sin(m.pi/2)',
              setup = 'import math as m',
              number = 10000)
print('time in milli-seconds = ',1000*executionTimeMath)
print('speedup = ', executionTimeNP/executionTimeMath)
time in milli-seconds =  1.5877410041866824
speedup =  4.2706026889818745

Now, since numpy is optimized to deal with matrices, you might expect that we could run the code faster if we simply created a numpy array of length 10,000 and with each element set to $\pi/2$ and then evaluating the $\sin$ of each element. Let's see if this is faster. To do so, I will write two lines of code; the first makes a 10,000 element array filled with $\pi/2$, and the second line directs numpy to take the $sin$ of each element. This code is assigned to a string variable with triple quotes, which the timeit() function will strip off before evaluation.

In [3]:
s = '''
angles = (np.pi/2)*np.ones(10000)
sines = np.sin(angles)
'''
executionTimeNP_Array = timeit.timeit(stmt = s,
                          setup = 'import numpy as np',
                          number = 1)
print('time in milli-seconds = ', 1000*executionTimeNP_Array)
print('speedup = ', executionTimeNP/executionTimeNP_Array)
time in milli-seconds =  0.4983300023013726
speedup =  13.60666821297617

Timing a function

Suppose you have a more complicated piece of code; one method would be to assign the multiline python commands as a string (as in previous example); another is to define a function. To time a function, we can use the timeit.timeit() fuction and add the globals=globals() to be able to access the function definition that in the global namespace without having to explicitly import the function:

In [4]:
def sillyFunc():
    m.sin(m.pi/2)
    return
executionTimeM = timeit.timeit(stmt='sillyFunc', 
                               setup = 'import math as m',
                               number = 100000,
                               globals=globals())
print('time in milli-seconds = ', 1000*executionTimeM)
time in milli-seconds =  2.6494759949855506

Using NUMBA to speed up raw python code

The moral of the above tests are that if you can cast an operation into an array format, then you can significantly speed things up by using numpy. However, not all code can be "vectorized" in this manner. For more complex code, another option is to use Numba (see the numba documentation for more info). Numba takes a chunk of python code and using "just-in-time" compiling to turn the python code into machine level code, and allows one to achieve C-like speed (well, close anyway).

Simple function with no arguments

Here we simply use the exact same format as the previous example, with the additon of importing numba's "Just In Time" or jit compiler. Notice that this runs about 10x faster than the math library version in the previous executable cell, and twice as fast as the numpy array method.

In [5]:
from numba import jit
# jit decorator tells Numba to compile this function.
# The argument types will be inferred by Numba when function is called.
@jit
def timeNUMBA():
    m.sin(m.pi/2)
    return

executionTimeNumba = timeit.timeit(stmt='timeNUMBA', 
                                   setup = 'import math as m',
                                   number = 10000,
                                   globals=globals())
print('time in milli-seconds = ', 1000*executionTimeNumba)
time in milli-seconds =  0.26815501041710377

Function with multiple arguments

The same idea works here; just specify the arguments in the timeit call:

In [6]:
@jit
def timeNUMBA_2(a,theta):
    a*m.sin(theta)
    return

A problem with Numba?

Watch what happens if I time this two-argument function:

In [7]:
executionTimeMult = timeit.timeit(stmt = 'timeNUMBA_2(1.0,m.pi/2)', 
                                  setup ='import math as m',
                                  number=10000, 
                                  globals=globals())
print('time in milli-seconds = ', 1000*executionTimeMult)
time in milli-seconds =  110.94460000458639

The odd issue here is that the code above took WAY longer than you might expect. However, in the next cell, you'll see that for some reason the second execution time didn't suffer this problem. My guess is that somehow it has to do with numba itself; if, after defining the function timeNUMBA_2(a,theta), I add a single call to the function, then this seems to fix the problem. Try it and see; add the call as shown below to the cell two up from here, and rerun the notebook, and you'll see that the problem goes away.

@jit
def timeNUMBA_2(a,theta):
    a*m.sin(theta)
    return
timeNUMBA_2(1.0, m.pi/2)  # add this line, restart the kernal and re-run all cells
In [8]:
timeMN2 = timeit.timeit('timeNUMBA_2(1.0,m.pi/2)', 
                            setup ='import math as m',
                            number=10000, 
                            globals=globals())

print('time in milli-seconds = ', 1000*timeMN2)
time in milli-seconds =  3.397669002879411

Using the timeit module to time code is a convienent and relatively simple way to test your python code; Numba can significantly speed up code too, but apparently there is a bug in numba that it is important to be aware of...if there is a fix or anyone can explain how to avoid this, I'd be delighted to know.