Scatterplot with marginal histograms

Here’s a code snippet that created a scatterplot and adds marginal histograms.

using Distributions, Plots, Measures, LaTeXStrings
# set nice fonts and defaults
default(fontfamily = "Computer Modern",titlefont = (14), legend=false, 
            guidefont = (14, :darkgreen), tickfont = (12, :black), 
            framestyle = :box, yminorgrid = true, xminorgrid= true, 
  , dpi=218)

x, y = rand(Normal(), 1000), rand(Normal(), 1000)
layout = @layout [a              _ 
                  b{0.75w,0.75h} c]

plot(layout = layout, link = :both, size = (600, 600))
scatter!(x,y, subplot = 2, framestyle = :box, 
         xlims=(-5,5), ylims=(-5,5), alpha=0.25,
         xlabel=L"\Delta x\,\mathrm{(m)}", ylabel=L"\Delta y\,\mathrm{(m)}")
histogram!([x y], subplot = [1 3], orientation = [:v :h], 
           framestyle = :none, legend=false, bins=-5:0.25:5,
           margin=-12mm, color= :green, alpha=0.5)

Here’s the output:

Posted in array, Histograms, Julia, Plots.jl, Plotting | Tagged , , | Leave a comment

Resample a vector of Matrices

When performing a simulation of a complex system, one often uses a small time step, resulting in large resulting solution arrays. Given Julia’s excellent speed, creating these arrays is often not terribly time consuming, but one does run into trouble when attempting to plot arrays with tens of millions of points.

A nice solution is to simply resample the solution arrays and plot the resulting arrays.

Suppose you have a Vector of Matrices. For a concrete example: suppose you have a three element vector, each element being an Nx2 matrix. Then, we can define a function resample_matrix which starts at the first row and samples every nth row until the end of the matrix:

resample_matrix(m::Matrix{Float64}, n::Int) = m[1:n:end, : ]   # resample every nth row

Now we can take an input vector (which I’ll call V) and create an array V_resampled where I sample every tenth point:

V_resampled = map(m->resample_matrix(m,10), V)

The map function takes the first element of V (an Nx2 matrix) and feeds that into the resample_matrix function where n=10. Then, the map function continues with this process for each element of V.

Yes, I know I haven’t made the resample_matrix function user-friendly by including any error checking etc. The point here is to just give a quick code example that will work.

Posted in Array, Julia, resampling | Tagged , , | Leave a comment

Convert a Vector of Vectors to an Array

It’s often the case in physics that one deals with a vector (i.e. a 1d array or list), where each entry is also a vector. Here’s some code which takes a list of vectors and converts them into an array in Julia.

The code is also available in a public gist at

Here is the code (tested in Julia 1.8.5)

# Convert a vector of vectors to an array
This function assumes that each vector is the same length;
v1 = [ [1,2], [2,3], [5,6] ] # this is fine
v2 = [ [1,2,8], [2,3], [5,6] ] # this is not fine

Example of usage:
vv = [[1,2,3], [4,5,6], [7,8,9],[12,13,14]]
println("num components in each vector = ", length(vv[1]))
println("num of vectors = ", length(vv))
function vecvec_to_array(vecvec)
    dim1 = length(vecvec)
    dim2 = length(vecvec[1])
    my_array = zeros(Int64, dim1, dim2)
    for i in 1:dim1
        for j in 1:dim2
            my_array[i,j] = vecvec[i][j]
    return my_array
Posted in Array, Julia | Leave a comment

Binomial Smoothing in Julia

Every experimental measurement in physics includes sources of noise. Of course, we try to minimize the sources of noise, but there is a limit to this ability (thermal noise, for instance, can be minimized, but not eliminated). Consequently, it often makes sense to smooth a data set to help reduce random noise in our data. One of the common methods is to use binomial smoothing.

A single pass of binomial smoothing applied to time series data {x} with N points from i=1 to i=N replaces x_i with

x_i = (x_{i-1} + 2x_i + x_{i+1})/4

The values of the first and last elements of the original time series are preserved if we extend the endpoints cleverly so that x_0 = x_1 - (x_2 - x_1) = 2x_1 - x_2 and x_{N+1} = x_N - (x_{N-1} - x_N) = 2x_N - x_{N-1}.

This smoothing amounts to weighting each point by the 2nd row in Pascal’s triangle (1 2 1) and dividing the result by the sum of their values (2^2 = 4). In this case, I consider the peak of Pascal’s triangle to be the zeroth row. If we execute another pass of binomial smoothing on the time series, this amounts to weighting the original data set with the 4th row (1 4 6 4 1) in Pascal’s triangle (sum = 2^4 = 16).

So, to execute N passes of binomial smoothing, we simple execute the (1 2 1) smoothing N times. This is more simply achieved using the algorithm of Marchand and Marmet: see Binomial smoothing filter: A way to avoid some pitfalls of least‐squares polynomial smoothing, Review of Scientific Instruments 54, 1034 (1983);

Because I have not found a Julia package that implements binomial smoothing, I have implemented the algorithm they describe in their 1983 paper and posted the code on GitHub at

The documentation illustrates the usage of the function, and shows several examples, including an interactive example which illustrates simultaneously the effect of the smoothing on the original time series data as well as the power spectrum of the data.

Posted in data analysis, Interactive, Julia, Smoothing | Tagged , , , , | Leave a comment

Numerical Solution for bound states of the quantum mechanical finite square well.

A standard problem in introductory quantum mechanics is to solve for the allowed bound state energies for a particle in a finite potential well. In the attached Jupyter Notebook, I describe this problem, and work out the numerical solutions, and display the results both graphically and numerically.

The code is easily modified for any finite well; shown in the notebook is the solution for an electron in a well with depth 80 eV and width 0.4 nm. You can find the Jupyter Notebook file (and the needed associated image file) at my Scipyscriptrepo GitHub site, or view the notebook using this nbviewer link.

Posted in Uncategorized | Leave a comment

Timing Python Code and Speeding things up with Numba

For many small coding tasks, we simply use the computer to analyze a small amount of data, or simulate a simple system and speed is not really an issue. However, there are certainly many circumstances where speed is important. This post will show

1. how to use python’s *timeit* module to time code snippets and functions
2. how to use numba to speed up code (significantly—often by more than 10x)

Using timeit to test small code snippets
Python has a nice module that can be used to time code (both from a bash terminal and from within a Jupyter Notebook or ipython console.) You can read about the timeit class at the Python 3.6 Documentation page for the timeit module; here I will confine myself to the timeit.timeit() function. The Python 3.6 Documentation defines this function as

timeit.timeit(stmt='pass', setup='pass', timer=, number=1000000, globals=None)

Create a Timer instance with the given statement, setup code and timer function and run its timeit() method with number executions. The optional globals argument specifies a namespace in which to execute the code.

The timeit.timeit() function returns a the time needed in units of seconds. Be careful when executing this function as it defaults to a million executions, and this could be prohibitively long for a complex function.
Let’s start with a simple example and get progressively fancier.

Click on this link: NumbaExample for a static html version of the Jupyter Notebook, and my GitHub site for a link to the actual notebook.


Posted in Jupyter Notebook, Numba, Timing Code | 2 Comments

Ideal Gas in a Box

A box with 10 particles starting with 1 on the left and 9 on the right with 100 steps taken. Three different runs are shown: Blue: 1 run,  Green: average of 10 runs, Red: average of 1000 runs.
A box with 10 particles starting with 1 on the left and 9 on the right with 100 steps taken. Three different runs are shown: Blue: 1 run, Green: average of 10 runs, Red: average of 1000 runs.

I’m teaching Statistical and Thermal Physics this semester using Gould and Tobochnik’s text of the same name. The text comes with Java programs to run simulations to help students (and me!) gain understanding about how systems with large numbers of particles behave.

On pages 9-10 of Statistical and Thermal Physics, Gould/Tobochnik describe a simple model of a non-interacting ideal gas which I’ve implemented as an iPython Notebook; the algorithm loads N particles in a box with nLeft particles initially in the left half (and consequently N-nLeft on the right half), and then generates a random number (r) between zero and one and compares this number to the ratio of nLeft/N. If r <= \mathrm{nLeft}/N, then a particle is moved from the left to the right, otherwise a particle is moved from the right to the left. This process is run repeatedly (user specified timeSteps for each run) to simulate the evolution of the system as a function of the number of steps taken (this isn’t a proper time evolution, as we’re not solving Newton’s Laws here)

The notebook  IdealGasInABox.ipynb implements this, and allows one to easily run 1, 10, or any number of simulations and average the results to show the number of particles on the left side of the box as a function of the number of steps taken. Above you can see an example plot from the code.

If you’re new to the iPython Notebook Viewer, here’s the scoop: The link above will render the notebook in your browser, and at the top right of the rendered page will be a download link. My advice is to right click the link and select Save As… to save the .ipynb file to your hard drive. You may have to edit the file name, because I find on my Mac that the file is saved with an unneeded .txt extension at the end. I remove the .txt and then my mac recognizes this as a iPython notebook that can be opened with Enthought’s Canopy application.


Posted in Ideal Gas, iPython, iPython Notebook, Matplotlib, Plotting, Statistical Physics | Leave a comment

Data fitting with fit uncertainties

(Updated 11-Jan-2022; fixed broken links and updated notebook)

This started out as a way to make sure I understood the numpy array slicing methods, and builds on my previous post about using scipy to fit data. I define a 3 parameter exponential decay

    \[\mathrm{func}(x, a, b, c) = a e^{-b x} + c\]

add some gaussian noise, and then use scipy to get the best fit as well as the covariance matrix. My understanding is that the square root of the diagonal elements gives me the 1 \sigma uncertainty on the corresponding fit parameter. So I then use the uncertainties on a, b, c to compute all 8 possible effective parameter values and their corresponding fit arrays.

I then use numpy to find the standard deviation of the 8 different fit values at each x, and use this as the 1\;\sigma uncertainty on the fit at a given x. Once I have this array of fit uncertainties, I plot the best fit curve, the fit+ 1\sigma curve, the fit - 1\sigma curve, and use the matplotlib ) function to plot the \pm 3\sigma bars.

In any case, here is a Jupyter Notebook with some exposition: CurveFitWith1SigmaBand.ipynb

Here is the final plot: (click on link to see .pdf version)


Posted in curve fitting, data analysis, error bars, iPython, Matplotlib, Plotting | 4 Comments

Fitting data with SciPy

Here’s a common thing scientists need to do, and it’s easy to accomplish in python. Suppose that you have a data set consisting of temperature vs time data for the cooling of a cup of coffee. We’ll start by importing the needed libraries and defining a fitting function:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def fitFunc(t, a, b, c):
    return a*np.exp(-b*t) + c

Now we create some fake data as numpy arrays and add some noise; the fake data will be called noisy:

t = np.linspace(0,4,50)
temp = fitFunc(t, 2.5, 1.3, 0.5)
noisy = temp + 0.25*np.random.normal(size=len(temp))

The scipy.optimize module contains a least squares curve fit routine that requires as input a user-defined fitting function (in our case fitFunc ),  the x-axis data (in our case, t) and the y-axis data (in our case, noisy). The curve_fit routine returns an array of fit parameters, and a matrix of covariance data (the square root of the diagonal values are the 1-sigma uncertainties on the fit parameters—provided you have a reasonable fit in the first place.):

fitParams, fitCovariances = curve_fit(fitFunc, t, noisy)
print fitParams
print fitCovariance


[ 2.42573207  1.38417604  0.57396876]
[[ 0.01738482  0.00815523 -0.00077816]
 [ 0.00815523  0.02509109  0.00606179]
 [-0.00077816  0.00606179  0.00292134]]

Now we plot the data points with error bars, plot the best fit curve, and label the axes:

plt.ylabel('Temperature (C)', fontsize = 16)
plt.xlabel('time (s)', fontsize = 16)
# plot the data as red circles with vertical errorbars
plt.errorbar(t, noisy, fmt = 'ro', yerr = 0.2)
# now plot the best fit curve and also +- 1 sigma curves
# (the square root of the diagonal covariance matrix  
# element is the uncertianty on the fit parameter.)
sigma = [fitCovariance[0,0], \
         fitCovariance[1,1], \
         fitCovariance[2,2] \
plt.plot(t, fitFunc(t, fitParams[0], fitParams[1], fitParams[2]),\
         t, fitFunc(t, fitParams[0] + sigma[0], fitParams[1] - sigma[1], fitParams[2] + sigma[2]),\
         t, fitFunc(t, fitParams[0] - sigma[0], fitParams[1] + sigma[1], fitParams[2] - sigma[2])\  
# save plot to a file
savefig('dataFitted.png', bbox_inches=0)

The output file looks like this:


You can see a the curve fitting routine as a python script, and you can see an ipython notebook here.





Posted in curve fitting, data analysis, error bars, Matplotlib, Plotting | Tagged , , | 4 Comments

Mt. Washington Weather analyzed with Pandas, Part 1.

I’m working on a text on computational physics whose primary goal is to create something useful for a one semester introductory course that all our physics majors (and now chemistry majors too) will be required to take. I want students to come away with tools that they’ll use later in their student careers (undergraduate & graduate) and can use for their professional work as well. The toolkit that  I want them to develop includes: being able to use LaTeX, read data files, plot and fit data, as well as the standard fare of creating simulations to learn about physical systems. After taking this class, my hope is that they will naturally turn to python/numpy/matplotlib to process & visualize their data, and know the limitations of these tools, and where to go if these limits are exceeded.

I’m realizing that this toolkit portion is more important than I’ve previously thought — when you’re in the laboratory taking data, you’ve got to analyze it, make plots, & write about your work—-these are skills that ALL scientists need, all the time, and it’s making me rethink my course as part 1 of an eventual two semester sequence, and this also changes the way I think about the text.

So, in an effort to grab the attention of the non-physicists in my course, I thought I’d work on including (early on, rather than waiting several weeks or months) something basic that all science students need to be able to do—read a  data file, filter out bad data, and visualize the data. The data set I chose to work with is the weather data from the summit of Mt. Washington, NH, the highest peak in New England, and home to some surprisingly bad weather like 231mph wind speeds.

I obtained the data from NOAA Climatic Data Center.  I used “go to map application” button and retrieve a comma-separated-value (.csv file) that looked like this

USAF, NCDC, Date, HrMn,I,Type,Dir, Q,I, Spd,Q, Temp, Q,

726130,14755,20060101,0059,7,FM-15,240,1,N, 1.0,1, -8.0,1,

726130,14755,20060101,0159,7,FM-15,180,1,N, 2.1,1, -8.0,1,  …..

etc. For about 250,000 lines — as I downloaded the weather data from 1973 to 2013. Notice that that data didn’t start from 1973. For some bizarre reason it was in a few contiguous, but out of order blocks. The columns I’m interested in here are Date, HrMn, Dir, Spd, and Temp (when you get the download link from NOAA after requesting the data, the download includes a descriptor file that details what’s  in each column. Navigating the NOAA site to finally choose and request a download link to the datais a post in itself. )

So there we are—the data’s in front of me on my hard disk, but to see it and analyze it, I need to read it, combine the date-time data, sort it in time, extract the columns I want, eliminate the bogus data points (they were set to either 999 or 999.9 in the raw data file) and finally plot it. Turns out 250,000 points is too much for matplotlib to deal with gracefully unless you plot it within an iPython notebook (at least on OS X); more about that later.

Part 2 will show how to obtain this plot:


and this summary via Python & Pandas (notice the minimum temperature of -42 C, and the maximum wind speed of 81.2 m/s (or 182 mph for the metrically disinclined):


Just for fun, here’s a plot of the Wind Speed versus the temperature:


and, interestingly, here’s a plot of the temperature vs the north component of the wind speed (positive = north, negative =south). Curiously, it seems symmetrical about the N-S direction, which surprises me:


Posted in data analysis, Interesting Links, iPython, Pandas, Plotting | Tagged , , , , | Leave a comment