Binomial Smoothing in Julia

Every experimental measurement in physics includes sources of noise. Of course, we try to minimize the sources of noise, but there is a limit to this ability (thermal noise, for instance, can be minimized, but not eliminated). Consequently, it often makes sense to smooth a data set to help reduce random noise in our data. One of the common methods is to use binomial smoothing.

A single pass of binomial smoothing applied to time series data {x} with N points from i=1 to i=N replaces x_i with

x_i = (x_{i-1} + 2x_i + x_{i+1})/4

The values of the first and last elements of the original time series are preserved if we extend the endpoints cleverly so that x_0 = x_1 - (x_2 - x_1) = 2x_1 - x_2 and x_{N+1} = x_N - (x_{N-1} - x_N) = 2x_N - x_{N-1}.

This smoothing amounts to weighting each point by the 2nd row in Pascal’s triangle (1 2 1) and dividing the result by the sum of their values (2^2 = 4). In this case, I consider the peak of Pascal’s triangle to be the zeroth row. If we execute another pass of binomial smoothing on the time series, this amounts to weighting the original data set with the 4th row (1 4 6 4 1) in Pascal’s triangle (sum = 2^4 = 16).

So, to execute N passes of binomial smoothing, we simple execute the (1 2 1) smoothing N times. This is more simply achieved using the algorithm of Marchand and Marmet: see Binomial smoothing filter: A way to avoid some pitfalls of least‐squares polynomial smoothing, Review of Scientific Instruments 54, 1034 (1983);

Because I have not found a Julia package that implements binomial smoothing, I have implemented the algorithm they describe in their 1983 paper and posted the code on GitHub at

The documentation illustrates the usage of the function, and shows several examples, including an interactive example which illustrates simultaneously the effect of the smoothing on the original time series data as well as the power spectrum of the data.

Posted in data analysis, Interactive, Julia, Smoothing | Tagged , , , , | Leave a comment

Numerical Solution for bound states of the quantum mechanical finite square well.

A standard problem in introductory quantum mechanics is to solve for the allowed bound state energies for a particle in a finite potential well. In the attached Jupyter Notebook, I describe this problem, and work out the numerical solutions, and display the results both graphically and numerically.

The code is easily modified for any finite well; shown in the notebook is the solution for an electron in a well with depth 80 eV and width 0.4 nm. You can find the Jupyter Notebook file (and the needed associated image file) at my Scipyscriptrepo GitHub site, or view the notebook using this nbviewer link.

Posted in Uncategorized | Leave a comment

Timing Python Code and Speeding things up with Numba

For many small coding tasks, we simply use the computer to analyze a small amount of data, or simulate a simple system and speed is not really an issue. However, there are certainly many circumstances where speed is important. This post will show

1. how to use python’s *timeit* module to time code snippets and functions
2. how to use numba to speed up code (significantly—often by more than 10x)

Using timeit to test small code snippets
Python has a nice module that can be used to time code (both from a bash terminal and from within a Jupyter Notebook or ipython console.) You can read about the timeit class at the Python 3.6 Documentation page for the timeit module; here I will confine myself to the timeit.timeit() function. The Python 3.6 Documentation defines this function as

timeit.timeit(stmt='pass', setup='pass', timer=, number=1000000, globals=None)

Create a Timer instance with the given statement, setup code and timer function and run its timeit() method with number executions. The optional globals argument specifies a namespace in which to execute the code.

The timeit.timeit() function returns a the time needed in units of seconds. Be careful when executing this function as it defaults to a million executions, and this could be prohibitively long for a complex function.
Let’s start with a simple example and get progressively fancier.

Click on this link: NumbaExample for a static html version of the Jupyter Notebook, and my GitHub site for a link to the actual notebook.


Posted in Jupyter Notebook, Numba, Timing Code | 2 Comments

Ideal Gas in a Box

A box with 10 particles starting with 1 on the left and 9 on the right with 100 steps taken. Three different runs are shown: Blue: 1 run,  Green: average of 10 runs, Red: average of 1000 runs.
A box with 10 particles starting with 1 on the left and 9 on the right with 100 steps taken. Three different runs are shown: Blue: 1 run, Green: average of 10 runs, Red: average of 1000 runs.

I’m teaching Statistical and Thermal Physics this semester using Gould and Tobochnik’s text of the same name. The text comes with Java programs to run simulations to help students (and me!) gain understanding about how systems with large numbers of particles behave.

On pages 9-10 of Statistical and Thermal Physics, Gould/Tobochnik describe a simple model of a non-interacting ideal gas which I’ve implemented as an iPython Notebook; the algorithm loads N particles in a box with nLeft particles initially in the left half (and consequently N-nLeft on the right half), and then generates a random number (r) between zero and one and compares this number to the ratio of nLeft/N. If r <= \mathrm{nLeft}/N, then a particle is moved from the left to the right, otherwise a particle is moved from the right to the left. This process is run repeatedly (user specified timeSteps for each run) to simulate the evolution of the system as a function of the number of steps taken (this isn’t a proper time evolution, as we’re not solving Newton’s Laws here)

The notebook  IdealGasInABox.ipynb implements this, and allows one to easily run 1, 10, or any number of simulations and average the results to show the number of particles on the left side of the box as a function of the number of steps taken. Above you can see an example plot from the code.

If you’re new to the iPython Notebook Viewer, here’s the scoop: The link above will render the notebook in your browser, and at the top right of the rendered page will be a download link. My advice is to right click the link and select Save As… to save the .ipynb file to your hard drive. You may have to edit the file name, because I find on my Mac that the file is saved with an unneeded .txt extension at the end. I remove the .txt and then my mac recognizes this as a iPython notebook that can be opened with Enthought’s Canopy application.


Posted in Ideal Gas, iPython, iPython Notebook, Matplotlib, Plotting, Statistical Physics | Leave a comment

Data fitting with fit uncertainties

(Updated 11-Jan-2022; fixed broken links and updated notebook)

This started out as a way to make sure I understood the numpy array slicing methods, and builds on my previous post about using scipy to fit data. I define a 3 parameter exponential decay

    \[\mathrm{func}(x, a, b, c) = a e^{-b x} + c\]

add some gaussian noise, and then use scipy to get the best fit as well as the covariance matrix. My understanding is that the square root of the diagonal elements gives me the 1 \sigma uncertainty on the corresponding fit parameter. So I then use the uncertainties on a, b, c to compute all 8 possible effective parameter values and their corresponding fit arrays.

I then use numpy to find the standard deviation of the 8 different fit values at each x, and use this as the 1\;\sigma uncertainty on the fit at a given x. Once I have this array of fit uncertainties, I plot the best fit curve, the fit+ 1\sigma curve, the fit - 1\sigma curve, and use the matplotlib ) function to plot the \pm 3\sigma bars.

In any case, here is a Jupyter Notebook with some exposition: CurveFitWith1SigmaBand.ipynb

Here is the final plot: (click on link to see .pdf version)


Posted in curve fitting, data analysis, error bars, iPython, Matplotlib, Plotting | 4 Comments

Fitting data with SciPy

Here’s a common thing scientists need to do, and it’s easy to accomplish in python. Suppose that you have a data set consisting of temperature vs time data for the cooling of a cup of coffee. We’ll start by importing the needed libraries and defining a fitting function:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def fitFunc(t, a, b, c):
    return a*np.exp(-b*t) + c

Now we create some fake data as numpy arrays and add some noise; the fake data will be called noisy:

t = np.linspace(0,4,50)
temp = fitFunc(t, 2.5, 1.3, 0.5)
noisy = temp + 0.25*np.random.normal(size=len(temp))

The scipy.optimize module contains a least squares curve fit routine that requires as input a user-defined fitting function (in our case fitFunc ),  the x-axis data (in our case, t) and the y-axis data (in our case, noisy). The curve_fit routine returns an array of fit parameters, and a matrix of covariance data (the square root of the diagonal values are the 1-sigma uncertainties on the fit parameters—provided you have a reasonable fit in the first place.):

fitParams, fitCovariances = curve_fit(fitFunc, t, noisy)
print fitParams
print fitCovariance


[ 2.42573207  1.38417604  0.57396876]
[[ 0.01738482  0.00815523 -0.00077816]
 [ 0.00815523  0.02509109  0.00606179]
 [-0.00077816  0.00606179  0.00292134]]

Now we plot the data points with error bars, plot the best fit curve, and label the axes:

plt.ylabel('Temperature (C)', fontsize = 16)
plt.xlabel('time (s)', fontsize = 16)
# plot the data as red circles with vertical errorbars
plt.errorbar(t, noisy, fmt = 'ro', yerr = 0.2)
# now plot the best fit curve and also +- 1 sigma curves
# (the square root of the diagonal covariance matrix  
# element is the uncertianty on the fit parameter.)
sigma = [fitCovariance[0,0], \
         fitCovariance[1,1], \
         fitCovariance[2,2] \
plt.plot(t, fitFunc(t, fitParams[0], fitParams[1], fitParams[2]),\
         t, fitFunc(t, fitParams[0] + sigma[0], fitParams[1] - sigma[1], fitParams[2] + sigma[2]),\
         t, fitFunc(t, fitParams[0] - sigma[0], fitParams[1] + sigma[1], fitParams[2] - sigma[2])\  
# save plot to a file
savefig('dataFitted.png', bbox_inches=0)

The output file looks like this:


You can see a the curve fitting routine as a python script, and you can see an ipython notebook here.





Posted in curve fitting, data analysis, error bars, Matplotlib, Plotting | Tagged , , | 4 Comments

Mt. Washington Weather analyzed with Pandas, Part 1.

I’m working on a text on computational physics whose primary goal is to create something useful for a one semester introductory course that all our physics majors (and now chemistry majors too) will be required to take. I want students to come away with tools that they’ll use later in their student careers (undergraduate & graduate) and can use for their professional work as well. The toolkit that  I want them to develop includes: being able to use LaTeX, read data files, plot and fit data, as well as the standard fare of creating simulations to learn about physical systems. After taking this class, my hope is that they will naturally turn to python/numpy/matplotlib to process & visualize their data, and know the limitations of these tools, and where to go if these limits are exceeded.

I’m realizing that this toolkit portion is more important than I’ve previously thought — when you’re in the laboratory taking data, you’ve got to analyze it, make plots, & write about your work—-these are skills that ALL scientists need, all the time, and it’s making me rethink my course as part 1 of an eventual two semester sequence, and this also changes the way I think about the text.

So, in an effort to grab the attention of the non-physicists in my course, I thought I’d work on including (early on, rather than waiting several weeks or months) something basic that all science students need to be able to do—read a  data file, filter out bad data, and visualize the data. The data set I chose to work with is the weather data from the summit of Mt. Washington, NH, the highest peak in New England, and home to some surprisingly bad weather like 231mph wind speeds.

I obtained the data from NOAA Climatic Data Center.  I used “go to map application” button and retrieve a comma-separated-value (.csv file) that looked like this

USAF, NCDC, Date, HrMn,I,Type,Dir, Q,I, Spd,Q, Temp, Q,

726130,14755,20060101,0059,7,FM-15,240,1,N, 1.0,1, -8.0,1,

726130,14755,20060101,0159,7,FM-15,180,1,N, 2.1,1, -8.0,1,  …..

etc. For about 250,000 lines — as I downloaded the weather data from 1973 to 2013. Notice that that data didn’t start from 1973. For some bizarre reason it was in a few contiguous, but out of order blocks. The columns I’m interested in here are Date, HrMn, Dir, Spd, and Temp (when you get the download link from NOAA after requesting the data, the download includes a descriptor file that details what’s  in each column. Navigating the NOAA site to finally choose and request a download link to the datais a post in itself. )

So there we are—the data’s in front of me on my hard disk, but to see it and analyze it, I need to read it, combine the date-time data, sort it in time, extract the columns I want, eliminate the bogus data points (they were set to either 999 or 999.9 in the raw data file) and finally plot it. Turns out 250,000 points is too much for matplotlib to deal with gracefully unless you plot it within an iPython notebook (at least on OS X); more about that later.

Part 2 will show how to obtain this plot:


and this summary via Python & Pandas (notice the minimum temperature of -42 C, and the maximum wind speed of 81.2 m/s (or 182 mph for the metrically disinclined):


Just for fun, here’s a plot of the Wind Speed versus the temperature:


and, interestingly, here’s a plot of the temperature vs the north component of the wind speed (positive = north, negative =south). Curiously, it seems symmetrical about the N-S direction, which surprises me:


Posted in data analysis, Interesting Links, iPython, Pandas, Plotting | Tagged , , , , | Leave a comment

How to post Python source code on WordPress

I’m back to posting now that I am on sabbatical; things have evolved on WordPress in the last year, and now it’s easy to post source code. See the following link so that you can see read about how easy it is to include code in WordPress.

Of course, if you are editing a post on WordPress, you should see the little “code” textual icon in the editing toolbar; just click it and you can insert exciting code:

import numpy as np
import matplotlib.pyplot as plt

t = linspace(0,10,200)
plot( t, sin(t) )
Posted in Interesting Links, Matplotlib, Plotting | Tagged , , | Leave a comment

Miscellaneous Python Links

Here’s a nice link to a pdf file with a summary of matlab, NumPy, and R commands.Very handy for people like me that cannot remember all the various commands one needs to get something done.

Also, if you haven’t tried out the new iPython Notebook, you should give it a try; it can mix markup (i.e. text–and LaTeX mathematics!) and python code and output in mathematica style. It’s wonderful for interactive work, development, and making instructional demonstrations. You’ll need to install iPython 0.12 or later, available here.

Here’s a very handy python package: the uncertainties package. If you have setuptools installed, you can install it with

sudo easy_install --upgrade uncertainties

See the web page for more info; if you’ve ever spent 10 minutes taking a boatload of partial derivatives (and if your an experimental physicist, you have)  to propagate errors, this package is an amazing time saver.

A project I’m keeping my eye on is PANDAS, which is a Python Data Analysis Library. It was created by someone working in the financial world (Wes McKinney); so I’m not clear yet that it’s the right tool for physicists, but it’s looking interesting. Here is a video introduction by the author himself.




Posted in data analysis, error propagation, Interesting Links, iPython, Pandas | Tagged , , , , | Leave a comment

Simple Animated Plot with Matplotlib

Here’s a simple script which is a good starting point for animating a plot using matplotlib’s animation package (which, by their own admission, is really in a beta status as of matplotlib 1.1.0). I find the code needed to perform the animation more cumbersome than I’d like, but importantly, it’s not too cumbersome. In line2 50-51 of the code, where the actual animation call is made, note that

animation.FuncAnimation(fig, simPoints, simData, blit=False,
   interval=10, repeat=True)

has the user function simData as the argument for the user function simPoints. These are two functions that the user has to provide. A little staring at the code, and I think you’ll quickly be able to adapt it to suit your needs.

I don’t think of this as a great example how I eventually want to construct this site, but I think I realize that I should at least start somewhere, and perhaps when I am on sabattical and have more time to devote to this, I’ll nail down some nice structures for archiving, searching and displaying code. For now, I’ll archive the code on PasteBin  (link: ), and provide a local copy here (double clicking on the code selects it all):

This short code snippet utilizes the new animation package in
matplotlib 1.1.0; it's the shortest snippet that I know of that can
produce an animated plot in python. I'm still hoping that the
animate package's syntax can be simplified further. 
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

def simData():
# this function is called as the argument for
# the simPoints function. This function contains
# (or defines) and iterator---a device that computes
# a value, passes it back to the main program, and then
# returns to exactly where it left off in the function upon the
# next call. I believe that one has to use this method to animate
# a function using the matplotlib animation package.
    t_max = 10.0
    dt = 0.05
    x = 0.0
    t = 0.0
    while t < t_max:
        x = np.sin(np.pi*t)
        t = t + dt
        yield x, t

def simPoints(simData):
    x, t = simData[0], simData[1]
    line.set_data(t, x)
    return line, time_text

##   set up figure for plotting:
fig = plt.figure()
ax = fig.add_subplot(111)
# I'm still unfamiliar with the following line of code:
line, = ax.plot([], [], 'bo', ms=10)
ax.set_ylim(-1, 1)
ax.set_xlim(0, 10)
time_template = 'Time = %.1f s'    # prints running simulation time
time_text = ax.text(0.05, 0.9, '', transform=ax.transAxes)
## Now call the animation package: (simData is the user function
## serving as the argument for simPoints):
ani = animation.FuncAnimation(fig, simPoints, simData, blit=False,\
     interval=10, repeat=True)
Posted in Animation, Matplotlib, Plotting | Tagged , , , , | 11 Comments