2. The Calculus of Variations
Michael Fowler
Introduction
We’ve seen how Whewell solved the problem of the equilibrium shape of chain hanging between two places, by finding how the forces on a length of chain, the tension at the two ends and its weight, balanced. We’re now going to look at a completely different approach: the equilibrium configuration is an energy minimum, so small deviations from it can only make second-order changes in the gravitational potential energy. Here we’ll find how analyzing that leads to a differential equation for the curve, and how the technique developed can be successfully applied of a vast array of problems.
The Catenary and the Soap Film
The catenary is the curved configuration of a uniform inextensible rope with two fixed endpoints at rest in a constant gravitational field. That is to say, it is the curve that minimizes the gravitational potential energy
where we have taken the rope density and gravity both equal to unity for mathematical convenience. Usually in calculus we minimize a function with respect to a single variable, or several variables. Here the potential energy is a function of a function, equivalent to an infinite number of variables, and our problem is to minimize it with respect to arbitrary small variations of that function. In other words, if we nudge the chain somewhere, and its motion is damped by air or internal friction, it will settle down again in the catenary configuration.
Formally speaking, there will be no change in that potential energy to leading order if we make an infinitesimal change in the curve, (subject of course to keeping the length the same, that is .)
This method of solving the problem is called the calculus of variations: in ordinary calculus, we make an infinitesimal change in a variable, and compute the corresponding change in a function, and if it’s zero to leading order in the small change, we’re at an extreme value.
(Nitpicking footnote: Actually this assumes the second order term is nonzero—what about near the origin? But such situations are infrequent in the problems we’re likely to encounter.)
The difference here is that the potential energy of the hanging change isn’t just a function of a variable, or even of a number of variables—it’s a function of a function, it depends on the position of every point on the chain (in the limit of infinitely small links, that is, or equivalently a continuous rope).
So, we’re looking for the configuration where the potential energy doesn’t change to first order for any infinitesimal change in the curve of its position, subject to fixed endpoints, and a fixed chain length.
As a warm up, we’ll consider a simpler—but closely related—problem.
A Soap Film Between Two Horizontal Rings: the Euler-Lagrange Equation
This problem is very similar to the catenary: surface tension will pull the soap film to the minimum possible total area compatible with the fixed boundaries (and neglecting gravity, which is a small effect).
(Interestingly, this problem is also closely related to string theory: as a closed string propagates, its path traces out as “world sheet” and the string dynamics is determined by that sheet having minimal area.)
Taking the axis of rotational symmetry to be the -axis, and the radius , we need to find the function that minimizes the total area ( is measured along the curve of the surface). Think of the soap film as a sequence of rings or collars, of radius and therefore area The total area is given by integrating, adding all these incremental collars,
subject to given values of at the two ends. (You might be thinking at this point: isn’t this identical to the catenary equation? The answer is yes, but the chain has an additional requirement: it has a fixed length. The soap film is not constrained in that way, it can stretch or contract to minimize the total area, so this is a different problem!)
That is, we want to first order, if we make a change . Of course, this also means where .
General Method for the Minimization Problem
To emphasize the generality of the method, we’ll just write
.
Then under any infinitesimal variation (equal to zero at the fixed endpoints)
To make further progress, we write , then integrate the second term by parts, remembering at the endpoints, to get
Since this is true for any infinitesimal variation, we can choose a variation which is only nonzero near one point in the interval, and deduce that
This general result is called the Euler-Lagrange equation. It’s very importantyou’ll be seeing it again.
An Important First Integral of the Euler-Lagrange Equation
It turns out that, since the function does not contain explicitly, there is a simple first integral of this equation. Multiplying throughout by ,
Since doesn’t depend explicitly on , we have
and using this to replace in the preceding equation gives
then multiplying by − (to match the equation as usually written) we have
giving a first integral
For the soap film between two rings problem,
so the Euler-Lagrange equation is
and has first integral
We’ll write
with a the constant of integration, which will depend on the endpoints.
This is a first-order differential equation, and can be solved.
Rearranging,
or
The standard substitution here is , from which
Here is the second constant of integration, the fixed endpoints determine .
The Soap Film and the Chain
We see that the soap film profile function and the hanging chain have identical analytic form. This is not too surprising, because the potential energy of the hanging chain in simplified units is just
the same as the area function for the soap film. But there’s an important physical difference: the chain has a fixed length. The soap film is free to adjust its “length” to minimize the total area. The chain isn’t—it’s constrained. How do we deal with that?
Lagrange Multipliers
The problem of finding minima (or maxima) of a function subject to constraints was first solved by Lagrange. A simple example will suffice to show the method.
Imagine we have some smooth curve in the plane that does not pass through the origin, and we want to find the point on the curve that is its closest approach to the origin. A standard illustration is to picture a winding road through a bowl shaped valley, and ask for the low point on the road. (We’ll also assume that determines uniquely, the road doesn’t double back, etc. If it does, the method below would give a series of locally closest points to the origin, we would need to go through them one by one to find the globally closest point.)
Let’s write the curve, the road, (the wiggly red line in the figure below).
To find the closest approach point algebraically, we need to minimize (square of distance to origin) subject to the constraint .
In the figure, we’ve drawn curves
for a range of values of (the circles centered at the origin).
We need to find the point of intersection of with the smallest circle it intersectsand it’s clear from the figure that it must touch that circle (if it crosses, it will necessarily get closer to the origin).
Therefore, at that point, the curves and are parallel.
Therefore the normals to the curves are also parallel:
.
(Note: yes, those are the directions of the normals— for an infinitesimal displacement along the curve =constant, , so the vector is perpendicular to . This is also analogous to the electric field being perpendicular to the equipotential )
The constant introduced here is called a Lagrange multiplier. It’s just the ratio of the lengths of the two normal vectors (of course, “normal” here means the vectors are perpendicular to the curves, they are not normalized to unit length!) We can find in terms of but at this point we don’t know their values.
The equations determining the closest approach to the origin can now be written:
(The third equation is just , meaning we’re on the road.)
We have transformed a constrained minimization problem in two dimensions to an unconstrained minimization problem in three dimensions!
The first two equations can be solved to find λ and the ratio , the third equation then gives separately.
Exercise for the reader: Work through this for (There are two solutions because the curve is a hyperbola with two branches.)
Lagrange multipliers are widely used in economics, and other useful subjects such as traffic optimization.
Lagrange Multiplier for the Chain
The catenary is generated by minimizing the potential energy of the hanging chain given above,
but now subject to the constraint of fixed chain length,
The Lagrange multiplier method generalizes in a straightforward way from variables to variable functions. In the curve example above, we minimized subject to the constraint What we need to do now is minimize subject to the constraint
For the minimum curve and the correct (so far unknown) value of , an arbitrary infinitesimal variation of the curve will give zero first-order change in , we write this as
Remarkably, the effect of the constraint is to give a simple adjustable parameter, the origin in the direction, so that we can satisfy the endpoint and length requirements.
The solution to the equation follows exactly the route followed for the soap film, leading to the first integral
with a constant of integration, which will depend on the endpoints.
Rearranging,
or
The standard substitution here is , we find
Here is the second constant of integration, the fixed endpoints and length give In general, the equations must be solved numerically. To get some feel for why this will always work, note that changing varies how rapidly the cosh curve climbs from its low point of , increasing “fattens” the curve, then by varying we can move that lowest point to the lowest point of the chain (or rather of the catenary, since it might be outside the range covered by the physical chain).
Algebraically, we know the curve can be written as , although at this stage we don’t know the constant or where the origin is. What we do know is the length of the chain, and the horizontal and vertical distances and between the fixed endpoints. It’s straightforward to calculate that the length of the chain is , and the vertical distance between the endpoints is from which . All terms in this equation are known except , which can therefore be found numerically. (This is in Wikipedia, among other places.)
Exercise: try applying this reasoning to finding for the soap film minimization problem. In that case, we know and , there is no length conservation requirement, to find we must eliminate the unknown from the equations This is not difficult, but, in contrast to the chain, does not give in terms of , instead, appear separately. Explain, in terms of the physics of the two systems, why this is so different from the chain.
The Brachistochrone
Suppose you have two points, A and B, B is below A, but not directly below. You have some smooth, let’s say frictionless, wire, and a bead that slides on the wire. The problem is to curve the wire from A down to B in such a way that the bead makes the trip as quickly as possible.
This optimal curve is called the “brachistochrone”, which is just the Greek for “shortest time”.
But what, exactly, is this curve, that is, what is , in the obvious notation?
This was the challenge problem posed by Johann Bernoulli to the mathematicians of Europe in a Journal run by Leibniz in June 1696. Isaac Newton was working fulltime running the Royal Mint, recoining England, and hanging counterfeiters. Nevertheless, ending a full day’s work at 4 pm, and finding the problem delivered to him, he solved it by 4am the next morning, and sent the solution anonymously to Bernoulli. Bernoulli remarked of the anonymous solution “I recognize the lion by his clawmark”.
This was the beginning of the Calculus of Variations.
Here’s how to solve the problem: we’ll take the starting point A to be the origin, and for convenience measure the -axis positive downwards. This means the velocity at any point on the path is given by
So measuring length along the path as as usual, the time is given by
Notice that this has the same form as the catenary equation, the only difference being that is replaced by , the integrand does not depend on so we have the first integral:
That is,
so
being a constant of integration (the 2 proves convenient).
Recalling that the curve starts at the origin A, it must begin by going vertically downward, since For small enough , we can approximate by ignoring the 1, so , . The curve must however become horizontal if it gets as far down as , and it cannot go below that level.
Rearranging in order to integrate,
This is not a very appealing integrand. It looks a little nicer on writing ,
Now what? We’d prefer for the expression inside the square root to be a perfect square, of course. You may remember from high school trig that . This gives immediately that
so the substitution is what we need.
Then
This integrates to give
where we’ve fixed the constant of integration so that the curve goes through the origin (at ).
To see what this curve looks like, first ignore the term in , leaving Evidently as increases from zero, the point goes anticlockwise around a circle of radius centered at , that is, touching the -axis at the origin.
Now adding the back in, this circular motion move steadily to the right, in such a way that the initial direction of the path is vertically down. (For very small ).
Visualizing the total motion as steadily increases, the center moves from its original position at to the right at a speed Meanwhile, the point is moving round the circle anticlockwise at this same speed. Putting together the center’s linear velocity with the corresponding angular velocity, we see the motion is the path of a point on the rim of a wheel rolling without sliding along a road (upside down in our case, of course). This is a cycloid.
Fastest Curve for Given Horizontal Distance
Suppose we want to find the curve a bead slides down to minimize the time from the origin to some specified horizontal displacement X, but we don’t care what vertical drop that entails.
Recall how we derived the equation for the curve:
At the minimum, under any infinitesimal variation ,
Writing , and integrating the second term by parts,
In the earlier treatment, both endpoints were fixed, so we dropped that final term.
However, we are now trying to find the fastest time for a given horizontal distance, so the final vertical distance is an adjustable parameter: .
As before, since for arbitrary we can still choose a which is only nonzero near some point not at the end, so we must still have
However, we must also have to first order for arbitrary infinitesimal (imagine a variation only nonzero near the endpoint), this can only be true if at
For the brachistochrone, so at means that , the curve is horizontal at the end
So the curve that delivers the bead a given horizontal distance the fastest is the half-cycloid (inverted) flat at the end. It’s easy to see this fixes the curve uniquely: think of the curve as generated by a rolling wheel, one half-turn of the wheel takes the top point to the bottom in distance X.
Exercise: how low does it go?
The Perfect Pendulum
Around the time of Newton, the best timekeepers were pendulum clocksbut the time of oscillation of a simple pendulum depends on its amplitude, although of course the correction is small for small amplitude. The pendulum takes longer for larger amplitude. This can be corrected for by having the string constrained between enclosing surfaces to steepen the pendulum’s path for larger amplitudes, and thereby speed it up.
It turns out (and was proved geometrically by Newton) that the ideal pendulum path is a cycloid. Thinking in terms of the equivalent bead on a wire problem, with a symmetric cycloid replacing the circular arc of an ordinary pendulum, if the bead is let go from rest at any point on the wire, it will reach the center in the same time as from any other point. So a clock with a pendulum constrained to such a path will keep very good time, and not be sensitive to the amplitude of swing.
The proof involves similar integrals and tricks to those used above:
and with the parameterization above, , the integral becomes
.
As before, we can now write , etc., to find that is in fact independent of .
This is left as an exercise for the reader. (Hint: you may find to be useful. Can you prove this integral is correct? Why doesn’t it depend on ?)
Exercise: As you well know, a simple harmonic oscillator, a mass on a linear spring with restoring force , has a period independent of amplitude. Does this mean that a particle sliding on a cycloid is equivalent to a simple harmonic oscillator? Find out by expressing the motion as an equation where the distance variable from the origin is s measured along the curve.
Check out cycloids here!
Calculus of Variations with Many Variables
We’ve found the equations defining the curve along which the integral
has a stationary value, and we’ve seen how it works in some two-dimensional curve examples.
But most dynamical systems are parameterized by more than one variable, so we need to know how to go from a curve in to one in a space , and we need to minimize (say)
In fact, the generalization is straightforward: the path deviation simply becomes a vector,
Then under any infinitesimal variation (writing also )
Just as before, we take the variation zero at the endpoints, and integrate by parts to get now n separate equations for the stationary path:
Multivariable First Integral
Following and generalizing the one-variable derivation, multiplying the above equations one by one by the corresponding we have the n equations
Since doesn’t depend explicitly on , we have
and just as for the one-variable case, these equations give
and the (important!) first integral