Linear Algebra
William F. Moss1
1 Least Squares Linear Models
Suppose we are given a set of m data points
(t1,y1),(t2,y2),¼,(tm,ym) |
|
which we believe to be samples from some underlying function. Our
problem is to approximate the underlying function using
all available information.
In general, this is a difficult problem because we may know very
little about the underlying function
and the data may be corrupted with noise.
There are two common approaches to approximation of the underlying
function: interpolation and curve fitting.
In interpolation a function, called an interpolant, is constructed
whose graph passes through the data points. In curve fitting
a function is constructed which passes "close to" the data points but
not necessarily through them. This function is said to "fit" the
data. When a significant amount of noise
is present in the data, an interpolant may not provide a
satisfactory approximation of the underlying function
because the interpolant may oscillate rapidly. In this case, a slowly
varying function that "fits" the data may provide
a better approximation of the underlying function.
A linear model of the form
f(t;c) = |
n å
j = 1
|
cj fj(t) |
| (1) |
is often used to interpolate or fit data. It is a linear combination of
"basis" functions,
f1(t), ¼, fn(t),
which are selected based on practical considerations, visual inspection
of the data, and knowledge of the processes giving rise to the data.
The linear model depends on an independent variable t and on a set
of n coefficients, c1,¼,cn. The
c in f(t;c)
denotes a column vector containing these coefficients.
In interpolation, the number of basis functions, n, is chosen to be
equal to the number of data points, m, while for curve fitting
n is typically much smaller than m. We will assume that
n £ m.
As an example, suppose a straight line is to be used to fit m ³ 2
data points. We set n = 2 and
f1(t) = 1 and f2(t) = t. Then f has the form
f(t;c) = c1 + c2t.
Generally, this line will be an interpolant only in case m = 2.
A straight line fit is often used in statistics where this line is
called the regression line.
If the linear model is constructed so that it changes much more
slowly than an interpolant, then curve fitting
can be used as a noise filter. This approach is illustrated in the first
problem of this project. Here, samples taken from a straight line are corrupted
with noise generated randomly, and then a straight line is fit to the data.
The original line is very nearly recovered.
In the mathematical discussion below, we will
use the following matrix notation which was introduced in the
Orthogonal Methods course outline. Let
y = [y1,¼,ym]T, c = [c1,¼,cn]T ,A = (aij) = (fj(ti)). |
|
The m ×n matrix A is often called the design matrix.
The interpolation problem is to find c so that the
interpolation equations
have a solution.
Substituting equation (1) into (2), we find that
f(ti;c) = |
n å
j = 1
|
cj fj(ti) = ( A c )i, i = 1,¼,m |
|
which is a system of m equations and n unknowns with the matrix form
The curve fitting problem is to find c so that the graph of
f passes "close to" the data points. The most commonly used
method for defining "close to" is called the least squares method.
In the least squares method, c is determined
by minimizing the sum of the squares of the vertical deviations between
the graph of f and the data points. The problem is to find
cls so that
|
m å
i = 1
|
[f(ti;cls)-yi]2 = |
min
c Î IRn ×1
|
|
m å
i = 1
|
[f(ti;c)-yi]2. |
| (4) |
We can convert (4) into matrix form as follows.
First, note that
f(ti;c) = |
n å
j = 1
|
cj fj(ti) = (A c)i and |
m å
i = 1
|
[(A c)i - yi ]2 = ||A c - y||22. |
| (5) |
From (4) and (5) we see that
|
m å
i = 1
|
[f(ti;cls)-yi]2 £ |
m å
i = 1
|
[f(ti;c)-yi]2 for all c |
| (6) |
implies that
||A cls - y||22 £ ||A c - y||22 for all c. |
| (7) |
Taking the square
root of both sides of (7), yields the standard matrix form
of the linear least squares problem.
Find cls so that
||A cls - y||2 = |
min
c Î IRn ×1
|
||A c - y||2. |
| (8) |
Figure
Figure 1: The m = 3, n = 2 case.
Figure 1 provides geometric insight into the problem
of solving (3) and (8)
for the case m = 3 and n = 2. Here
y Î IR3 ×1 and A = [a1,a2]
is 3 ×2 with two
linearly independent columns,
a1 and a2. Now Ran(A) is
the span{a1,a2}, that is, the
set of all linear combinations of the columns of A.
It is a two-dimensional subspace of IR3 ×1
and is visualized in Figure 1
by the plane which passes through the origin and contains
a1,a2.
Also, for any
c Î IR2 ×1,
A c = c1 a1+c2 a2 Î Ran(A).
Now if y is
not in the plane Ran(A), as is the case in Figure 1, the
linear system (3) and the interpolation problem will not
have a solution.
On the other hand, the least squares problem always
has a solution cls.
The vector A cls is the
vector in the plane Ran(A), which is closest to y. Also,
the vector y - Acls
will be orthogonal to the
the plane Ran(A). Because the columns of A are linearly independent,
or equivalently because rank(A) = 2, cls is the only
solution to the least squares problem. Now imagine a case where
a1 and a2 lie along the same line. In this case,
Ran(A) is a one-dimensional
subspace of IR3 ×1 and can be visualized by a line passing
through the origin. Now there are infinitely many ways
to describe a vector in Ran(A) as a linear combination of a1
and a2, and so there are infinitely many solutions to the
least squares problem. We would prefer to have a unique linear
model which fits the data. This will happen if the basis functions
used in the linear model are chosen so that the columns of A
are linearly independent.
The m > 3 case works the same ways as the m = 3 case.
For c, A c Î Ran(A),
a subspace of IRm ×1. In the least squares problem, we
find the vector cls so that A cls Î Ran(A)
is as close to the vector y as possible. Because we have assumed that
n £ m, rank(A) £ n.
If the basis functions have been well chosen, the columns of A will
be linearly independent and rank(A) = n. In this case, (8)
is said to be a full rank least squares problem and
has a unique solution.
If rank(A) < n, (8) is said to be rank deficient and
has infinitely many solutions.
In Orthogonal Methods course outline,
we have see that the singular value decomposition (SVD)
of A can be used to solve the least squares problem. We
note that in the m = n case, it can also be used to solve
(3), although Gauss elimination with partial pivoting
is more frequently used. In the Orthogonal Methods course outline a
formula is given for cls.
In the rank deficient case, this formula gives the least squares
solution which has the smallest 2-norm. There are two other methods that
are commonly used to solve the full rank least squares problem:
QR factorization with column pivoting followed by the solution of
a triangular system, and solution of the normal equations.
However, the SVD approach is regarded as the most computationally reliable
method.
As explained in the instructions for this project, you are to create an M-file
with filename P1loginid.m containing MATLAB commands and statements to solve
the following problems. Note that you can cut MATLAB code from this PDF document
and paste it into the MATLAB editor.
Problem 1 In this problem you will fit noisy data using a line.
We generate
the data by taking 20 uniformly spaced samples from the line y = 1 + 2t for
t Î [0,1] and adding a small amount of random noise to each sample.
Add the following lines to P1loginid.m.
% Project 2. Last Name First Name. GroupXX
echo on
% Problem 1
% create m > 2 noisy data points lying close to a line
m = 20;
t = linspace(0,1,m)';
epsilon = .2; % Maximum noise amplitude
y = 1 + 2*t + epsilon*rand(m,1);
% Use the model c(1) + c(2)*t
% Construct the design matrix A.
A = [ones(m,1),t];
% Use the SVD to solve the least squares problem.
c = pinv(A)*y
if rank(A) == 2
disp('The design matrix is full rank')
else
disp('The design matrix is rank deficient.')
end
% Since A is full rank, we have found the unique
% solution to the full rank least squares problem.
% When the design matrix is rank deficient, the
% least squares problem has infinitely many
% solutions. The above computation would produce
% the solution that has the smallest 2-norm.
%
% Plot
tt = linspace(0,1,100)';
yy = c(1) + c(2)*tt;
plot(t,y,'o',tt,yy)
At the MATLAB prompt, type P2loginid to run your M-file. Note that a semi-colon
at the end of a line suppresses screen output for that line.
You should see your plot in a window labeled Figure No. 1.
Next, add three more lines for Problem 1.
Use the MATLAB commands "xlabel" (help xlabel),
ÿlabel" (help ylabel),
and "title" (help title) to label the
x and y axes and to add explanatory text at the top of the plot.
Run your M-file again. Notice that the solution to the least squares problem
is a coefficient vector which is close to [1,2].
Problem 2 The water level in the North Sea is mainly
determined by the so-called
M2-tide, whose period is about 12 hours. A reasonable but simple model
for predicting the level at any time t is a function of the form
h(t) = c1 f1(t) + c2 f2(t) +c3 f3(t), t in hours. |
|
where
|
|
f2(t) = sin |
æ ç
è
|
2 pt 12
|
ö ÷
ø
|
|
|
f3(t) = cos |
æ ç
è
|
2 pt 12
|
ö ÷
ø
|
. |
|
|
|
|
Here we are using basis functions which are periodic with period
12 hours. Does this seem reasonable?
We wish to estimate the values of the unknown coefficients c1,
c2, and c3
from a set of measured values of water level y1, ¼, ym made at
corresponding times t1, ¼, tm by solving a linear least
squares problem as discussed above.
From calculus we know that determining c1, c2,
and c3 corresponds to estimating
the mean water level (bias), the amplitude of variation about the mean, and
the phase of the oscillatory water level.
Suppose we have made the measurements:
tj | 0 | 2 | 4 | 6 | 8 | 10 | hours |
yj | 1.0 | 1.6 | 1.4 | 0.6 | 0.2 | 0.8 | meters |
Add the following lines to your M-file to solve this linear least
squares problem having m = 6 data points and n = 3 basis functions.
% Problem 2
m = 6;
t = [0;2;4;6;8;10]; % hours
y = [1;1.6;1.4;0.6;0.2;0.8]; % meters
% design matrix
A = [ones(m,1),sin(pi*t/6),cos(pi*t/6)];
c = pinv(A)*y % least squares solution
% Plot the data and the model
% plot vectors
tt = linspace(0,10,100)';
yy = [ones(100,1),sin(pi*tt/6),cos(pi*tt/6)]*c;
figure % create a new plot window
plot(t,y,'o',tt,yy)
At the MATLAB prompt, type P2loginid to run your M-file.
You should see your plot for problem 2 in a window labeled Figure No. 2.
Next, add three more lines for Problem 2.
Use the MATLAB commands "xlabel" (help xlabel),
ÿlabel" (help ylabel),
and "title" (help title) to label the
x and y axes and to add explanatory text at the top of the plot.
Run your M-file again. Notice that the model passes "close to" the data
points but does not interpolate them.
Problem 3 The interpolation equations (2)
can be satisfied if and only if the linear system (3)
has a solution.
If n < m, there will be more equations than unknowns and this overdetermined
system will not generally have a solution. This is clear in Figure 1
where m = 3 and n = 2. Show that this is the case
for the example in Problem 2 by applying the rank test from the
Linear Systems and Matrices Outline.
You must compare the rank of A to the rank of [A,y].
Problem 4 Add lines to your M-file to
find a straight line fit to the following
data and plot a graph of the straight line model
along with the data on the same set of axes. Label your graph.
Note that here we have (x,y) data instead of (t,y) data.
Density of ore x | 2.8 | 2.9 | 3.0 | 3.1 | 3.2 | 3.2 | 3.2 | 3.2 | 3.3 | 3.4 |
Iron content y(%) | 30 | 26 | 33 | 31 | 33 | 35 | 33 | 37 | 36 | 33 |
Problem 5 Balancing a Chemical Reaction.
2
Balance the chemical reaction
x1 C u2 S + x2 H+ + x3 N O3- ®x4 C u2+ + x5 N O + x6 S8 + x7 H2 O |
|
by calculating the smallest possible, positive integer coefficients
x1,¼,x7. The principles are that the number of atoms of each
element must be conserved as well as the total electric charge.
This gives six equations for the seven coefficients. Put these
equations in the form of a homogeneous system Ax = 0.
The null space N(A) has dimension one. Find a basis vector for
this null space. Find the solution in the form cv where
v is your basis vector and c is a scalar.
Remember that the coefficients must be positive integers and they
must be as small as possible. Hint: it is possible to solve this problem
using the MATLAB functions augmovie, or ref and slash, or null.
Problem 6 The following problem arises later in the course
when we model heat conduction in a rod and a vibrating string.
Find a scalar l (eigenvalue) and a nontrivial function
X(x) (eigenfunction) so that
This problem is sometimes referred to as an eigenvalue problem for an
ordinary differential equation. It is also called a Sturm-Liouville problem.
It can be shown that there are infinitely many positive eigenvalues.
We want to approximate the smallest
one, which the Sturm-Liouville theory says is p2.
We will reduce this problem to a matrix eigenvalue problem using a
centered finite difference approximation for X¢¢.
First, we discretize the interval [0,1] with mesh points
xi = ih, i = 0,¼,n+1, and h = 1/(n+1). Next, we use the
approximation
X¢¢(xi) » |
X(xi+1) - 2X(xi) + X(xi-1) h2
|
for i = 1,¼,n, |
| (12) |
to derive a matrix
eigenvalue problem by evaluating (9) at x1,¼,xn,
and using (10), (11), and (12).
This problem has the form
where Z = [X(x1),¼,X(xn)]T and A has the number 2 on the
main diagonal, the number -1 on the first subdiagonal and first
superdiagonal, and is zero elsewhere.
Add the following lines to your M-file to solve this problem.
Experiment with the value of n.
% Problem 6
n = 100;
h = 1/(n+1); e = ones(n,1)/h^2;
A = spdiags([-e 2*e -e],-1:1,n,n);
z = eig(A); format long
y = sort(z)/pi^2;
y(1)
Footnotes:
1 Department of Mathematical Sciences, Clemson
University, Clemson, SC 29634-1907, U.S.A.
(bmoss@math.clemson.edu).
Copyright 1995,1996 William F. Moss. All rights reserved.
2 This problem appears
in FirstLeaves: A Tutorial Introduction, B. W. Char
et. al., Springer 1992, page 197 (ISBN 0-387-97621-3).
File translated from TEX by TTH, version 2.22.
On 13 May 1999, 00:45.