Data fitting with fit uncertainties

This started out as a way to make sure I understood the numpy array slicing methods, and builds on my previous post about using scipy to fit data. I define a 3 parameter exponential decay

    \[\mathrm{func}(x, a, b, c) = a e^{-b x} + c\]

add some gaussian noise, and then use scipy to get the best fit as well as the covariance matrix. My understanding is that the square root of the diagonal elements gives me the 1 \sigma uncertainty on the corresponding fit parameter. So I then use the uncertainties on a, b, c to compute all 8 possible effective parameter values and their corresponding fit arrays.

I then use numpy to find the standard deviation of the 8 different fit values at each x, and use this as the 1\;\sigma uncertainty on the fit at a given x. Once I have this array of fit uncertainties, I plot the best fit curve, the fit+ 1\sigma curve, the fit - 1\sigma curve, and use the matplotlib ) function to plot the \pm 3\sigma bars.

In any case, here is the script: , and here is an iPythonNotebook with some exposition: CurveFitWith1SigmaBand.ipynb .

Here is the final plot: (click on plot to see .pdf version)

Fitting data with SciPy

Here’s a common thing scientists need to do, and it’s easy to accomplish in python. Suppose that you have a data set consisting of temperature vs time data for the cooling of a cup of coffee. We’ll start by importing the needed libraries and defining a fitting function:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def fitFunc(t, a, b, c):
    return a*np.exp(-b*t) + c

Now we create some fake data as numpy arrays and add some noise; the fake data will be called noisy:

t = np.linspace(0,4,50)
temp = fitFunc(t, 2.5, 1.3, 0.5)
noisy = temp + 0.25*np.random.normal(size=len(temp))

The scipy.optimize module contains a least squares curve fit routine that requires as input a user-defined fitting function (in our case fitFunc ),  the x-axis data (in our case, t) and the y-axis data (in our case, noisy). The curve_fit routine returns an array of fit parameters, and a matrix of covariance data (the square root of the diagonal values are the 1-sigma uncertainties on the fit parameters—provided you have a reasonable fit in the first place.):

fitParams, fitCovariances = curve_fit(fitFunc, t, noisy)
print fitParams
print fitCovariance


[ 2.42573207  1.38417604  0.57396876]
[[ 0.01738482  0.00815523 -0.00077816]
 [ 0.00815523  0.02509109  0.00606179]
 [-0.00077816  0.00606179  0.00292134]]

Now we plot the data points with error bars, plot the best fit curve, and label the axes:

plt.ylabel('Temperature (C)', fontsize = 16)
plt.xlabel('time (s)', fontsize = 16)
# plot the data as red circles with vertical errorbars
plt.errorbar(t, noisy, fmt = 'ro', yerr = 0.2)
# now plot the best fit curve and also +- 1 sigma curves
# (the square root of the diagonal covariance matrix  
# element is the uncertianty on the fit parameter.)
sigma = [fitCovariance[0,0], \
         fitCovariance[1,1], \
         fitCovariance[2,2] \
plt.plot(t, fitFunc(t, fitParams[0], fitParams[1], fitParams[2]),\
         t, fitFunc(t, fitParams[0] + sigma[0], fitParams[1] - sigma[1], fitParams[2] + sigma[2]),\
         t, fitFunc(t, fitParams[0] - sigma[0], fitParams[1] + sigma[1], fitParams[2] - sigma[2])\  
# save plot to a file
savefig('dataFitted.png', bbox_inches=0)

The output file looks like this:


You can see a the curve fitting routine as a python script, and you can see an ipython notebook here.





Mt. Washington Weather analyzed with Pandas, Part 1.

I’m working on a text on computational physics whose primary goal is to create something useful for a one semester introductory course that all our physics majors (and now chemistry majors too) will be required to take. I want students to come away with tools that they’ll use later in their student careers (undergraduate & graduate) and can use for their professional work as well. The toolkit that  I want them to develop includes: being able to use LaTeX, read data files, plot and fit data, as well as the standard fare of creating simulations to learn about physical systems. After taking this class, my hope is that they will naturally turn to python/numpy/matplotlib to process & visualize their data, and know the limitations of these tools, and where to go if these limits are exceeded.

I’m realizing that this toolkit portion is more important than I’ve previously thought — when you’re in the laboratory taking data, you’ve got to analyze it, make plots, & write about your work—-these are skills that ALL scientists need, all the time, and it’s making me rethink my course as part 1 of an eventual two semester sequence, and this also changes the way I think about the text.

So, in an effort to grab the attention of the non-physicists in my course, I thought I’d work on including (early on, rather than waiting several weeks or months) something basic that all science students need to be able to do—read a  data file, filter out bad data, and visualize the data. The data set I chose to work with is the weather data from the summit of Mt. Washington, NH, the highest peak in New England, and home to some surprisingly bad weather like 231mph wind speeds.

I obtained the data from NOAA Climatic Data Center.  I used “go to map application” button and retrieve a comma-separated-value (.csv file) that looked like this

USAF, NCDC, Date, HrMn,I,Type,Dir, Q,I, Spd,Q, Temp, Q,

726130,14755,20060101,0059,7,FM-15,240,1,N, 1.0,1, -8.0,1,

726130,14755,20060101,0159,7,FM-15,180,1,N, 2.1,1, -8.0,1,  …..

etc. For about 250,000 lines — as I downloaded the weather data from 1973 to 2013. Notice that that data didn’t start from 1973. For some bizarre reason it was in a few contiguous, but out of order blocks. The columns I’m interested in here are Date, HrMn, Dir, Spd, and Temp (when you get the download link from NOAA after requesting the data, the download includes a descriptor file that details what’s  in each column. Navigating the NOAA site to finally choose and request a download link to the datais a post in itself. )

So there we are—the data’s in front of me on my hard disk, but to see it and analyze it, I need to read it, combine the date-time data, sort it in time, extract the columns I want, eliminate the bogus data points (they were set to either 999 or 999.9 in the raw data file) and finally plot it. Turns out 250,000 points is too much for matplotlib to deal with gracefully unless you plot it within an iPython notebook (at least on OS X); more about that later.

Part 2 will show how to obtain this plot:


and this summary via Python & Pandas (notice the minimum temperature of -42 C, and the maximum wind speed of 81.2 m/s (or 182 mph for the metrically disinclined):


Just for fun, here’s a plot of the Wind Speed versus the temperature:


and, interestingly, here’s a plot of the temperature vs the north component of the wind speed (positive = north, negative =south). Curiously, it seems symmetrical about the N-S direction, which surprises me:


How to post Python source code on WordPress

I’m back to posting now that I am on sabbatical; things have evolved on WordPress in the last year, and now it’s easy to post source code. See the following link so that you can see read about how easy it is to include code in WordPress.

Of course, if you are editing a post on WordPress, you should see the little “code” textual icon in the editing toolbar; just click it and you can insert exciting code:

import numpy as np
import matplotlib.pyplot as plt

t = linspace(0,10,200)
plot( t, sin(t) )

Miscellaneous Python Links

Here’s a nice link to a pdf file with a summary of matlab, NumPy, and R commands.Very handy for people like me that cannot remember all the various commands one needs to get something done.

Also, if you haven’t tried out the new iPython Notebook, you should give it a try; it can mix markup (i.e. text–and LaTeX mathematics!) and python code and output in mathematica style. It’s wonderful for interactive work, development, and making instructional demonstrations. You’ll need to install iPython 0.12 or later, available here.

Here’s a very handy python package: the uncertainties package. If you have setuptools installed, you can install it with

sudo easy_install --upgrade uncertainties

See the web page for more info; if you’ve ever spent 10 minutes taking a boatload of partial derivatives (and if your an experimental physicist, you have)  to propagate errors, this package is an amazing time saver.

A project I’m keeping my eye on is PANDAS, which is a Python Data Analysis Library. It was created by someone working in the financial world (Wes McKinney); so I’m not clear yet that it’s the right tool for physicists, but it’s looking interesting. Here is a video introduction by the author himself.




Simple Animated Plot with Matplotlib

Here’s a simple script which is a good starting point for animating a plot using matplotlib’s animation package (which, by their own admission, is really in a beta status as of matplotlib 1.1.0). I find the code needed to perform the animation more cumbersome than I’d like, but importantly, it’s not too cumbersome. In line2 50-51 of the code, where the actual animation call is made, note that

animation.FuncAnimation(fig, simPoints, simData, blit=False,
   interval=10, repeat=True)

has the user function simData as the argument for the user function simPoints. These are two functions that the user has to provide. A little staring at the code, and I think you’ll quickly be able to adapt it to suit your needs.

I don’t think of this as a great example how I eventually want to construct this site, but I think I realize that I should at least start somewhere, and perhaps when I am on sabattical and have more time to devote to this, I’ll nail down some nice structures for archiving, searching and displaying code. For now, I’ll archive the code on PasteBin  (link: ), and provide a local copy here (double clicking on the code selects it all):

This short code snippet utilizes the new animation package in
matplotlib 1.1.0; it's the shortest snippet that I know of that can
produce an animated plot in python. I'm still hoping that the
animate package's syntax can be simplified further. 
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

def simData():
# this function is called as the argument for
# the simPoints function. This function contains
# (or defines) and iterator---a device that computes
# a value, passes it back to the main program, and then
# returns to exactly where it left off in the function upon the
# next call. I believe that one has to use this method to animate
# a function using the matplotlib animation package.
    t_max = 10.0
    dt = 0.05
    x = 0.0
    t = 0.0
    while t < t_max:
        x = np.sin(np.pi*t)
        t = t + dt
        yield x, t

def simPoints(simData):
    x, t = simData[0], simData[1]
    line.set_data(t, x)
    return line, time_text

##   set up figure for plotting:
fig = plt.figure()
ax = fig.add_subplot(111)
# I'm still unfamiliar with the following line of code:
line, = ax.plot([], [], 'bo', ms=10)
ax.set_ylim(-1, 1)
ax.set_xlim(0, 10)
time_template = 'Time = %.1f s'    # prints running simulation time
time_text = ax.text(0.05, 0.9, '', transform=ax.transAxes)
## Now call the animation package: (simData is the user function
## serving as the argument for simPoints):
ani = animation.FuncAnimation(fig, simPoints, simData, blit=False,\
     interval=10, repeat=True)

First Post

This is the beginning of the Scientific Python Script Repository.
My intention is to create an
(a) open-source,
(b) high quality & elegant,
(c) searchable
database of  python scripts (from snippets to more
extended projects) useful for scientists aiming to get things done
with python.

To do this, I’ll have to make strict standards for code,
verification of code, and documentation. This is no small task;
so if you’re a scientist with expertise in python let me know.

You can contact me through my university home page:

No accounts will be created without verification.

More to come over the next few months.