1. how to use python’s *timeit* module to time code snippets and functions

2. how to use numba to speed up code (significantly—often by more than 10x)

**Using timeit to test small code snippets**

Python has a nice module that can be used to time code (both from a bash terminal and from within a Jupyter Notebook or ipython console.) You can read about the timeit class at the Python 3.6 Documentation page for the timeit module; here I will confine myself to the timeit.timeit() function. The Python 3.6 Documentation defines this function as

timeit.timeit(stmt='pass', setup='pass', timer=, number=1000000, globals=None)

*Create a Timer instance with the given statement, setup code and timer function and run its timeit() method with number executions. The optional globals argument specifies a namespace in which to execute the code.*

The timeit.timeit() function returns a the time needed in units of seconds. Be careful when executing this function as it defaults to a million executions, and this could be prohibitively long for a complex function.

Let’s start with a simple example and get progressively fancier.

Click on this link: NumbaExample for a static html version of the Jupyter Notebook, and my GitHub site for a link to the actual notebook.

]]>

I’m teaching Statistical and Thermal Physics this semester using Gould and Tobochnik’s text of the same name. The text comes with Java programs to run simulations to help students (and me!) gain understanding about how systems with large numbers of particles behave.

On pages 9-10 of Statistical and Thermal Physics, Gould/Tobochnik describe a simple model of a non-interacting ideal gas which I’ve implemented as an iPython Notebook; the algorithm loads N particles in a box with nLeft particles initially in the left half (and consequently N-nLeft on the right half), and then generates a random number (r) between zero and one and compares this number to the ratio of nLeft/N. If , then a particle is moved from the left to the right, otherwise a particle is moved from the right to the left. This process is run repeatedly (user specified timeSteps for each run) to simulate the evolution of the system as a function of the number of steps taken (this isn’t a proper time evolution, as we’re not solving Newton’s Laws here)

The notebook IdealGasInABox.ipynb implements this, and allows one to easily run 1, 10, or any number of simulations and average the results to show the number of particles on the left side of the box as a function of the number of steps taken. Above you can see an example plot from the code.

If you’re new to the iPython Notebook Viewer, here’s the scoop: The link above will render the notebook in your browser, and at the top right of the rendered page will be a download link. My advice is to right click the link and select Save As… to save the .ipynb file to your hard drive. You may have to edit the file name, because I find on my Mac that the file is saved with an unneeded .txt extension at the end. I remove the .txt and then my mac recognizes this as a iPython notebook that can be opened with Enthought’s Canopy application.

]]>

add some gaussian noise, and then use scipy to get the best fit as well as the covariance matrix. My understanding is that the square root of the diagonal elements gives me the 1 uncertainty on the corresponding fit parameter. So I then use the uncertainties on to compute all 8 possible effective parameter values and their corresponding fit arrays.

I then use numpy to find the standard deviation of the 8 different fit values at each x, and use this as the uncertainty on the fit at a given x. Once I have this array of fit uncertainties, I plot the best fit curve, the fit curve, the fit curve, and use the matplotlib plot.bar( ) function to plot the bars.

In any case, here is the script: CurveFitting.py , and here is an iPythonNotebook with some exposition: CurveFitWith1SigmaBand.ipynb .

Here is the final plot: (click on plot to see .pdf version)

]]>import numpy as np import matplotlib.pyplot as plt from scipy.optimize import curve_fit def fitFunc(t, a, b, c): return a*np.exp(-b*t) + c

Now we create some fake data as numpy arrays and add some noise; the fake data will be called *noisy:*

t = np.linspace(0,4,50) temp = fitFunc(t, 2.5, 1.3, 0.5) noisy = temp + 0.25*np.random.normal(size=len(temp))

The scipy.optimize module contains a least squares curve fit routine that requires as input a user-defined fitting function (in our case *fitFunc* ), the x-axis data (in our case, *t*) and the y-axis data (in our case, *noisy*). The curve_fit routine returns an array of fit parameters, and a matrix of covariance data (the square root of the diagonal values are the 1-sigma uncertainties on the fit parameters—provided you have a reasonable fit in the first place.):

fitParams, fitCovariances = curve_fit(fitFunc, t, noisy) print fitParams print fitCovariance

output:

[ 2.42573207 1.38417604 0.57396876] [[ 0.01738482 0.00815523 -0.00077816] [ 0.00815523 0.02509109 0.00606179] [-0.00077816 0.00606179 0.00292134]]

Now we plot the data points with error bars, plot the best fit curve, and label the axes:

plt.ylabel('Temperature (C)', fontsize = 16) plt.xlabel('time (s)', fontsize = 16) plt.xlim(0,4.1) # plot the data as red circles with vertical errorbars plt.errorbar(t, noisy, fmt = 'ro', yerr = 0.2) # now plot the best fit curve and also +- 1 sigma curves # (the square root of the diagonal covariance matrix # element is the uncertianty on the fit parameter.) sigma = [fitCovariance[0,0], \ fitCovariance[1,1], \ fitCovariance[2,2] \ ] plt.plot(t, fitFunc(t, fitParams[0], fitParams[1], fitParams[2]),\ t, fitFunc(t, fitParams[0] + sigma[0], fitParams[1] - sigma[1], fitParams[2] + sigma[2]),\ t, fitFunc(t, fitParams[0] - sigma[0], fitParams[1] + sigma[1], fitParams[2] - sigma[2])\ ) # save plot to a file savefig('dataFitted.png', bbox_inches=0)

The output file looks like this:

You can see a the curve fitting routine as a python script, and you can see an ipython notebook here.

]]>

I’m realizing that this toolkit portion is more important than I’ve previously thought — when you’re in the laboratory taking data, you’ve got to analyze it, make plots, & write about your work—-these are skills that ALL scientists need, all the time, and it’s making me rethink my course as part 1 of an eventual two semester sequence, and this also changes the way I think about the text.

So, in an effort to grab the attention of the non-physicists in my course, I thought I’d work on including (early on, rather than waiting several weeks or months) something basic that all science students need to be able to do—read a data file, filter out bad data, and visualize the data. The data set I chose to work with is the weather data from the summit of Mt. Washington, NH, the highest peak in New England, and home to some surprisingly bad weather like 231mph wind speeds.

I obtained the data from NOAA Climatic Data Center. I used “go to map application” button and retrieve a comma-separated-value (.csv file) that looked like this

USAF, NCDC, Date, HrMn,I,Type,Dir, Q,I, Spd,Q, Temp, Q,

726130,14755,20060101,0059,7,FM-15,240,1,N, 1.0,1, -8.0,1,

726130,14755,20060101,0159,7,FM-15,180,1,N, 2.1,1, -8.0,1, …..

etc. For about 250,000 lines — as I downloaded the weather data from 1973 to 2013. Notice that that data didn’t start from 1973. For some bizarre reason it was in a few contiguous, but out of order blocks. The columns I’m interested in here are Date, HrMn, Dir, Spd, and Temp (when you get the download link from NOAA after requesting the data, the download includes a descriptor file that details what’s in each column. *Navigating *the NOAA site to finally choose and request a download link to the datais a post in itself. )

So there we are—the data’s in front of me on my hard disk, but to see it and analyze it, I need to read it, combine the date-time data, sort it in time, extract the columns I want, eliminate the bogus data points (they were set to either 999 or 999.9 in the raw data file) and finally plot it. Turns out 250,000 points is too much for matplotlib to deal with gracefully unless you plot it within an iPython notebook (at least on OS X); more about that later.

**Part 2** will show how to obtain this plot:

and this summary via Python & Pandas (notice the minimum temperature of -42 C, and the maximum wind speed of 81.2 m/s (or 182 mph for the metrically disinclined):

Just for fun, here’s a plot of the Wind Speed versus the temperature:

and, interestingly, here’s a plot of the temperature vs the north component of the wind speed (positive = north, negative =south). Curiously, it seems symmetrical about the N-S direction, which surprises me:

]]>

Of course, if you are editing a post on WordPress, you should see the little “code” textual icon in the editing toolbar; just click it and you can insert exciting code:

import numpy as np import matplotlib.pyplot as plt t = linspace(0,10,200) plot( t, sin(t) ) show()]]>

Also, if you haven’t tried out the new iPython Notebook, you should give it a try; it can mix markup (i.e. text–and LaTeX mathematics!) and python code and output in mathematica style. It’s wonderful for interactive work, development, and making instructional demonstrations. You’ll need to install iPython 0.12 or later, available here.

Here’s a very handy python package: the uncertainties package. If you have setuptools installed, you can install it with

sudo easy_install --upgrade uncertainties

See the web page for more info; if you’ve ever spent 10 minutes taking a boatload of partial derivatives (and if your an experimental physicist, you have) to propagate errors, this package is an amazing time saver.

A project I’m keeping my eye on is PANDAS, which is a Python Data Analysis Library. It was created by someone working in the financial world (Wes McKinney); so I’m not clear yet that it’s the right tool for physicists, but it’s looking interesting. Here is a video introduction by the author himself.

]]>

animation.FuncAnimation(fig, simPoints, simData, blit=False, interval=10, repeat=True)

has the user function *simData* as the argument for the user function *simPoints*. These are two functions that the user has to provide. A little staring at the code, and I think you’ll quickly be able to adapt it to suit your needs.

I don’t think of this as a great example how I eventually want to construct this site, but I think I realize that I should at least start somewhere, and perhaps when I am on sabattical and have more time to devote to this, I’ll nail down some nice structures for archiving, searching and displaying code. For now, I’ll archive the code on PasteBin (link: SimpleAnimatedPlot.py ), and provide a local copy here (double clicking on the code selects it all):

""" This short code snippet utilizes the new animation package in matplotlib 1.1.0; it's the shortest snippet that I know of that can produce an animated plot in python. I'm still hoping that the animate package's syntax can be simplified further. """ import numpy as np import matplotlib.pyplot as plt import matplotlib.animation as animation def simData(): # this function is called as the argument for # the simPoints function. This function contains # (or defines) and iterator---a device that computes # a value, passes it back to the main program, and then # returns to exactly where it left off in the function upon the # next call. I believe that one has to use this method to animate # a function using the matplotlib animation package. # t_max = 10.0 dt = 0.05 x = 0.0 t = 0.0 while t < t_max: x = np.sin(np.pi*t) t = t + dt yield x, t def simPoints(simData): x, t = simData[0], simData[1] time_text.set_text(time_template%(t)) line.set_data(t, x) return line, time_text ## ## set up figure for plotting: ## fig = plt.figure() ax = fig.add_subplot(111) # I'm still unfamiliar with the following line of code: line, = ax.plot([], [], 'bo', ms=10) ax.set_ylim(-1, 1) ax.set_xlim(0, 10) ## time_template = 'Time = %.1f s' # prints running simulation time time_text = ax.text(0.05, 0.9, '', transform=ax.transAxes) ## Now call the animation package: (simData is the user function ## serving as the argument for simPoints): ani = animation.FuncAnimation(fig, simPoints, simData, blit=False,\ interval=10, repeat=True) plt.show()]]>

My intention is to create an

(a) open-source,

(b) high quality & elegant,

(c) searchable

database of python scripts (from snippets to more

extended projects) useful for scientists aiming to get things done

with python.

To do this, I’ll have to make strict standards for code,

verification of code, and documentation. This is no small task;

so if you’re a scientist with expertise in python let me know.

You can contact me through my university home page:

http://people.usm.maine.edu/~pauln/

No accounts will be created without verification.

More to come over the next few months.

]]>