Section 13.3 Fitting a Curve to Data
Frequently, curve fitting will be used to extract results from experimental data. In this section, we demonstrate use of
scipy.optimize.curve_fit, the curve fitting function provided within SciPy.
Let’s illustrate this process through an example. Assume that we have conducted an experiment in which we investigated the behavior of a voltage divider circuit shown in Figure 13.2.1 where \(V_\text{in}\) and \(R_\text{fixed}\) are treated as being unknown. We collect measurements of \(V_\text{out}\) as we vary and measure \(R_\text{pot}\text{.}\) A curve fitting process can then be employed to find \(V_\text{in}\) and \(R_\text{fixed}\) from fit parameters, where
\begin{equation}
V_\text{out}=\frac{R_\text{pot}}{R+R_\text{pot}}\tag{13.3.1}
\end{equation}
is our fitting function. Here, an executable routine is provided that uses the
curve_fit function from the scipy.optimize module to perform a curve fit to data, with a code break-down with explanation following. When examining the results provided by this curve fitting routine, note that values of \(V_\text{in}=5.0\text{V}\) and \(R_\text{fixed}=5\text{k}\Omega\) were used when generating the synthetic data in this example.Breakdown of curve fitting routine.
Let’s take a closer look at the pieces that go into this Python routine. We start by initializing our variable and function space. We ensure that all variables in the variable-space are cleared out and don’t carry over from other code boxes in the textbook. We then import functions from NumPy, SciPy, and MatPlotLib.pyplot packages that we’ll wish to use. Finally, we close all plot windows that may exist from other instances of running code in this textbook.
# Clear all variables from previous code runs
globals().clear()
# Import functions for use in this program
import numpy as np
import scipy.optimize as opt
import matplotlib.pyplot as plt
# Close all open figures
plt.close('all')
The The first line of our function definition always begins with
curve_fit function that we intend to use must be given information about the functional form of our fitting equation. This information will be provided by a function func that we define based on Equation (13.3.1). Once defined, func will be an input provided to the curve_fit function. # Define our fitting function
def func(R, Vin, Rfix):
out = Vin * R/(Rfix + R)
return out
def, followed by the name of the function we’re defining and its arguments. The curve_fit function expects the first argument of func to represent our data’s independent variable (often the horizontal axis of a plot). In this example, R will represent \(R_\text{pot}\text{.}\) Any additional arguments specified for func represent fitting parameters that will be determined by the curve fitting process. In this example, these fit parameters Vin and Rfix will represent \(V_\text{in}\) and \(R_\text{fixed}\) respectively. The next line(s) contain the actual functional form for our fitting function which will be dictated by Equation (13.3.1) in terms of the argument names above. The out variable represents our data’s dependent variable (often the vertical axis of a plot) and represents \(V_\text{out}\) in this example. The final line of the function definition for func will return to curve_fit a value of out based on the argument values that curve_fit provided to func.
The next lines of code represent the introduction of experimental data \(\left(R_\text{pot}, V_\text{out}\right)\) into the routine. There are only ten data points in this example, so it is not too onerous to enter the data by hand directly into the routine. For larger data sets, it is instead recommended that data import routines similar to those shown in Section 13.1 be used.
# Create arrays containing data from experiment
Rpot = np.array([1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]);
Vout = np.array([0.815, 1.332, 1.880, 2.238, 2.614, 2.777, 2.881, 3.289, 2.974, 3.096]);
We are now ready to perform a curve fit to our data. The Here, we provide the name of our defined function
curve_fit function takes as inputs the name of the fitting function, the variable name for the independent variable, and the variable name for the dependent variable. Further inputs and options are available but optional, and can be found in official documentation for SciPy. # Perform curve fitting
fit, fit_cov = opt.curve_fit(func, Rpot, Vout);
func, the name of our independent variable data Rpot, and the name of our dependent variable data Vout. The curve_fit function will then return its estimate for values of the fit parameters as an array of values in fit and something called the covariance as an array of values in fit_cov. A full description of the meaning of the covariance matrix can be found in data analysis or statistics texts. For our purposes, the covariance can be used to provide estimates of the one standard deviation uncertainty (arising from scatter in the data) for each of the fit parameters stored in fit. # Calculate standard deviation on fit parameters (associated with data scatter)
sigma=np.sqrt(np.diag(fit_cov));
Estimates for the fit parameters and data-scatter uncertainties are printed for the user. In this print command, the structure of the string that will be displayed is provided first, followed by values taken from variables present in our program. As an example, the portion of the command reading
# Print values and uncertainties for fit parameters
print('Vin={0:.2f}V +/- {1:.2f}V and Rfixed={2:.2f}ohms +/- {3:.2f}ohms'.format(fit[0],sigma[0],fit[1], sigma[1]))
Rfixed={2:.2f}ohms indicates that the spot enclosed by the curly brackets should be replaced by the second element (since the number before the colon is a ‘2’) in the format attribute (fit[1] in this case). The ‘:.2f’ indicates that the number placed in this spot should be a floating point number (because of the ‘f’) that is truncated two digits after the decimal point (because of the ‘:.2’). To better demonstrate this functionality, an alternative set of print commands, shown here, could have been used to split the statement of results into two lines of text. # Print values and uncertainties for fit parameters
print('Vin={0:.2f}V +/- {1:.2f}V '.format(fit[0],sigma[0]))
print('Rfixed={0:.2f}ohms +/- {1:.2f}ohms'.format(fit[1], sigma[1]))
To produce a theory curve based on these fit parameters, we first generate an array of values for \(R_\text{pot}\) associated with our theoretical prediction, and then evaluate a theoretical prediction for \(V_\text{out}\) for each of these \(R_\text{pot}\) values. Here,
# Create arrays containing values for the best fit curve
Rpot_fit=np.linspace(0,10000, 100);
Vout_fit=fit[0]*Rpot_fit/(fit[1]+Rpot_fit);
Rpot_fit is an array containing 100 values spaced linearly in the range (0,10000). These values are plugged into Equation (13.3.1) to generate an array of values for Vout_fit .
The remainder of the code contains plotting commands along the lines of those discussed in Section 13.2.
# Plot experiment data and best fit curve on the same axes
plt.plot(Rpot, Vout, 'o', label='Data');
plt.plot(Rpot_fit, Vout_fit, label='Fit')
plt.xlim(0,10500);
plt.ylim(0,3.5);
plt.xlabel(r'Resistance ($\Omega$)')
plt.ylabel(r'$V_\text{out}$')
plt.legend();
plt.show();
