62  Special Relativity: Kinematics

    Michael Fowler, UVa

A Quicker Derivation of the Lorentz Transformations

 The Modern Physics lectures you just reviewed (I hope), presented a derivation of the Lorentz transformations between two parallel frames S, S  with S  moving at constant speed v  along the common x  axis relative to S,  both taking the zero of time to be when the origins coincide. We followed Einstein’s thought experiments, all based on the assumption that the speed of light is the same in all inertial frames.

We’ll now show how, assuming the invariance of the speed of light, and that the equations are linear (like the Galilean ones), and that the time as observed on a receding clock doesn’t depend on the direction in which the clock is receding, we can derive the equations quite easily.

The Lorentz transformations relate the coordinates ct,x,y,z  of an event in one inertial frame S  to those c t , x , y , z  in another inertial frame S .  We’ll take the corresponding axes in the two frames to be parallel, and the relative frame velocity to be along the x  -axis.  And, we’ll write the time coordinate as ct  so all coordinates have the same dimension. (This must be so, of course,  but often units with c=1  are chosen, so it’s not always apparent. Or, as in some GR books, time is measured in meters.)

Taking the frame origins to coincide at t= t =0,  the origin O  in S  must therefore correspond to x=vt  in S

Let’s now assume that the transformation from ct,x  to c t , x  is linear (and we’ll assume y,z  are unchanged).  Then we can write:

c t x = α β γ δ ct x .

Imagine now the path of a flash of light emitted at the origin at the instant the two origins coincided.  A flash moving in the positive direction is given by x=ct  and also by x =c t , so one must imply the other.  Putting this in the equation gives α+β=γ+δ . Now consider the flash moving in the negative direction, x=t, x = t : this gives αβ=δγ .   Putting these equations together we find:

c t x = α β β α ct x .

Next, remembering that the origin x =0  in S  corresponds to x=vt  in S , and since x =βct+αx , it follows that βc=vα ,

c t x = α vα/c vα/c α ct x .

Imagine now a clock at the origin in S . The time t =αt  observed on that clock from the moving frame S  cannot depend on the sign of the relative velocity, so α v =α v , and therefore the inverse of the above transformation must have the same form (the same α  ) with only the sign of v  changed.  Now, performing a transformation to a frame moving at v , then one to v,   gets you back to the original frame:

α vα/c vα/c α α vα/c vα/c α = 1 0 0 1

from which α=1/ 1 v 2 / c 2  and the Lorentz transformations follow:

t = tvx/ c 2 1 v 2 / c 2 x = xvt 1 v 2 / c 2 y =y z =z.

Standard Relativistic Notation

An "event" has four coordinates: position in three-dimensional space, plus time. That is, it's just a point in four-dimensional space, a.k.a. space time.  The standard notation is

ct,x,y,z x 0 , x 1 , x 2 , x 3 ,

Note that, for this position vector, we write “up” indices. 

It’s also standard notation to write:

γ=1/ 1 v/c 2 , β = v /c ,c=1.

  (But we ‘ll be bilingual, sometimes using the old notation for simple arguments and concepts.)

Matrix Form of the Lorentz Transformation

The Lorentz transformation equations from frame S ,  moving at v  in the x -direction relative to S,  which we just derived in the preceding section, can be written in the new notation as:

x 0 x 1 x 2 x 3 = γ βγ 0 0 βγ γ 0 0 0 0 1 0 0 0 0 1 x 0 x 1 x 2 x 3 ,  

But now we make a further change in notation: this is a Lorentz transformation, so we call the matrix Λ  (the Greek L for Lorentz), and furthermore, following Einstein, we write an element of the matrix as Λ α β ,  so the equation above becomes

x α = Λ α β x β .

You’ve probably come across Einstein’s dummy suffix notation, in which a suffix that appears twice is automatically summed over all its allowed values. In relativity, there is a further refinement: the suffix must appear once up, and once down, meaning that in writing the matrix the second suffix must be down for the summation rules to yield ordinary matrix multiplication when operating on a position vector.

(Hint:  to check the sign, take the nonrelativistic limit, x 1 =βγ x 0 +γ x 1 vt+ x .  )

We see from the equation that

Λ α β = x α x β ,

noting that the “down” matrix index corresponds to an “up” index in the denominator.

Contravariant and Covariant Vectors

A contravariant vector is a set of four numbers in any inertial frame, A S A α , that transform from one frame to another like the coordinates of an event x α :  that is,

A α = Λ α β A β .

Trivia: Why is it called contravariant? Historically, the fundamental transformation was of the basis axes. If the scale on the basis axes is doubled, say, then a vector (regarded as having an independent existence) will have its measured components halved this is the “contra”.

Since we’ll soon be returning to electromagnetism, obviously we need to think about Maxwell’s equations in this new notation, and the first step is to see how differential operators / x ν  transform between frames.  More notation: we’ll sometimes write

x ν = ν .

In fact, the frame transformation for differentials is simple: from the chain rule of differentiation

x μ = x ν x μ x ν ,  

we have

μ = ν Λ ν μ .

Four-vectors that transform in this way, B μ = B ν Λ ν μ ,  are called covariant vectors. Note that they have down indices. (As we’ll explain in a moment, this can also be written B μ = Λ μ ν B ν ,  be careful with those indices!)

In ordinary three-dimensional space, regarding the contravariant vectors as column vectors, the covariant vectors are row vectors, and the matrix operates on them from the right. They are commonly called dual vectors, and in GR parlance they are often called forms

Important: The dot product of a contravariant vector and a covariant vector is invariant in a Lorentz transformation:

B μ A μ = B ν Λ ν μ Λ μ σ A σ = B ν A ν ,


Λ ν μ Λ μ σ = x ν x μ x μ x σ = x ν x σ = δ σ ν ,  

where δ σ ν  is the usual Kronecker delta in four dimensions, δ σ ν =1  if ν=σ,  zero otherwise.

Λ μ ν = x μ x ν , Λ μ ν 1 = Λ μ ν = x ν x μ .  

Notice how important it is to be clear about which index is up and which down!

So we have these two different transformation rules: the contravariant one, a transformation with the same matrix as event coordinates,  d x μ = Λ μ ν d x ν ,   and what is called the covariant one, having the same matrix as the differential operator set, x μ = Λ μ ν x ν .    This last equation is usually written

μ = Λ μ ν ν ,

and any set of four numbers defined in each frame that transforms like this is called a covariant vector (and has down indices).

Since Λ μ ν , Λ μ ν  are inverses of each other, a product of a covariant and a contravariant vector is invariant:

A μ B μ = Λ μ ν Λ μ σ A ν B σ = δ ν σ A ν B σ = A σ B σ .  

The Metric Tensor.  Magnitude of a Vector

Exercise:  check that under a Lorentz transformation, x 2 c 2 t 2  is invariant.  In fact, this quantity is called the magnitude of the four-vector.  Unlike most magnitudes, this one can be negative, or zero, for a nonzero vector.  To write it in terms of the new notation, we have to "square" the vector x μ ,  remembering that the time and space contributions have opposite signs.

The way this is done is to introduce a metric tensor,

g μν = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 .  

(Some authors, including Jackson, have an overall minus sign! See the discussion below.)

With this, the position vector x 0 , x 1 , x 2 , x 3  can be converted to one with down indices by:

x μ = g μν x ν ,  

and we see this gives x 0 , x 1 , x 2 , x 3 = x 0 , x 1 , x 2 , x 3 .  

(The index can be raised with g μν ,  which is the inverse of g μν ,  except that in our special relativity case, they're the same.)

The magnitude of the vector is written

x μ x μ = g μν x μ x ν = x 2 c 2 t 2 .  

Incidentally, we can see from the Lorentz transformation above why the matrix Λ μ ν  is the inverse of Λ μ ν .  Lowering the first index changes the signs of elements Λ 0 0 , Λ 0 1 , Λ 0 2 , Λ 0 3 ,  lowering the second index changes the signs of Λ 0 0 , Λ 1 0 , Λ 2 0 , Λ 3 0 ,  the net result is to reverse the velocity.

The Interval

One more piece of jargon: the interval.  Since the Lorentz transformation is linear, and true for arbitrary space time points, the four-vector difference Δ x μ  between two space time points clearly also transforms as a four vector, its magnitude is

d s 2 =Δ x μ Δ x μ = Δ x 0 2 + Δ x 1 2 + Δ x 2 2 + Δ x 3 2 = c 2 Δt 2 + Δx 2 + Δy 2 + Δz 2 ,  

and this "square", the so-called magnitude, rather than the vector itself, is called the interval.

Obviously, it can be positive, negative, or zero.

Warning: Sign of the Metric Tensor

We have chosen a diagonal metric tensor with elements -1, 1, 1, 1, so spacelike separated points have a positive interval separation. Unfortunately, an almost equally popular choice is 1, -1, -1, -1, sometimes called a timelike metric, and the one used by Jackson.  The spacelike metric is standard in General Relativity, the timelike more common in High Energy Physics.

Spacelike, Timelike, Lightlike

Two events c t 1 , x 1 , y 1 , z 1 , c t 2 , x 2 , y 2 , z 2  are said to be spacelike separated if the interval between them c 2 t 2 t 1 2 + x 2 x 1 2 + y 2 y 1 2 + z 2 z 1 2 =Δ s 2 >0.  

It is important to note that spacelike separation in one inertial frame of reference means spacelike separation in all inertial frames, since the magnitude is invariant under Lorentz transformation.

Similarly, timelike separation is Δ s 2 <0 , lightlike separation Δ s 2 =0 .

 Points lightlike separated from the origin are said to be on the light cone,

which is really two cones having vertices at the origin, the forward (in time) light cone, and the backward light cone. A light signal sent from the origin (meaning ct=x=0  ) could trigger an event (a bomb?) anywhere on the forward light cone, a light signal from anywhere on the backward light cone could trigger an event at the origin.

An event at the origin cannot be the cause of another event which is outside the forward light cone.

Exercise:  Imagine two observers in inertial frames moving relative to each other. Each observer has light detectors placed throughout the frame. The origins coincide at t=0,  and at that moment a light flashes at the common origin. From the detectors, each observer will say that a spherical surface of light goes outwards, centered at that observer’s origin (and remember these origins are moving relative to each other). Explain why there is no contradiction here.   

Worldlines: As a particle moves through space time, the path traced is termed the worldline.  Since particles travel at less than the speed of light, the world line lies within the forward light cone.  A particle at rest has a worldline along the axis of the cone: in other words, the time axis.  A photon has a world line on the surface of the light cone.

Relativistic Addition of Velocities

Deriving the Equations

As stated above, cΔt,Δx,Δy,Δz  transforms just as ct,x,y,z  does, we'll write the transformation

Δt=γ Δ t +vΔ x / c 2 Δx=γ vΔ t +Δ x Δy=Δ y ,Δz=Δ z .

From these equations in the limit of small displacements, Δx/Δt   gives the addition of velocities formulas

Δx Δt = u x = Δ x +vΔ t Δ t +vΔ x / c 2 = u x +v 1+ u x v/ c 2


u y = u y γ 1+ u x v/ c 2 .

(Recall the primed frame is moving at v  in the positive x -direction relative to the unprimed frame.)

Exercise:   Suppose a space station moving at v  in the x -direction relative to an observer sends a rocket ship forward at u  relative to the ship.  What is the velocity of the rocket ship relative to the “stationary” observer? 

Now suppose the space station is moving at 0.8c relative to the observer, the rocket ship moves at 0.8c relative to the space station, and the rocket ship fires a missile forward at 0.8c relative to itself. What is the speed of the missile relative to the original observer?

Rotations and Boosts, Rapidity

Notice now that the 4 x 4 Lorentz matrix can also represent ordinary rotations in the three-dimensional space:

x 0 x 1 x 2 x 3 = 1 0 0 0 0 cosθ sinθ 0 0 sinθ cosθ 0 0 0 0 1 x 0 x 1 x 2 x 3 ,  

and manifestly x μ x μ  is invariant.  Any three-dimensional space rotation can be represented by the lower-right 3 X 3 minor of the full 4 X 4 matrix.

In fact, the Lorentz transformation to a moving frame called a "boost" can be formulated in a strikingly similar way, in terms of a variable much favored by high energy physicists, the rapidity ψ,  defined by

v c =β=tanhψ,γ=1/ 1 β 2 =coshψ.  

  Rapidity proves to be a very useful parameter, because for one thing

tanh ψ+ ψ = tanhψ+tanh ψ 1+tanhψtanh ψ  

which is exactly the Lorentz addition formula for velocities!  (Recall "u+v"= u+v 1+uv/ c 2 .  )  This means that in successive boosts you just add the rapidities.

The Lorentz transformation for boosting from rest to a rapidity ψ  along the x -axis is:

x 0 x 1 x 2 x 3 = coshψ sinhψ 0 0 sinhψ coshψ 0 0 0 0 1 0 0 0 0 1 x 0 x 1 x 2 x 3 .  

That is, a particle at rest in the moving (boosted) frame is moving with rapidity ψ  in the original frame.

 Notice the similarity to the three-dimensional rotation!  The sign difference ensures both transformations are unitary.  Some authors (for instance, Zangwill) take time to be an imaginary variable, so the rotation and boost transformations look identical, but we'll stick with the more common practice.  (There are in fact deep mathematical differences between rotations and boosts, as we'll see.)

Lorentz Transformation for Arbitrary Direction

For a boost of v  in the x -direction the coordinates in the boosted frame are:

x 0 x 1 x 2 x 3 = γ βγ 0 0 βγ γ 0 0 0 0 1 0 0 0 0 1 x 0 x 1 x 2 x 3

For a boost of v = β c,  the corresponding matrix M  is:

x 0 x 1 x 2 x 3 = γ β 1 γ β 2 γ β 3 γ β 1 γ 1+(γ1) β 1 2 β 2 (γ1) β 1 β 2 β 2 (γ1) β 1 β 3 β 2 β 2 γ (γ1) β 1 β 2 β 2 1+(γ1) β 2 2 β 2 (γ1) β 2 β 3 v 2 β 3 γ (γ1) β 1 β 3 β 2 (γ1) β 2 β 3 β 2 1+(γ1) β 3 2 β 2 x 0 x 1 x 2 x 3 .

Notice first that this does give the right answer for the boost along the x  -axis.

But how did we come up with this matrix M?  

Our strategy for boosting in an arbitrary direction is to reorient the system so that that direction becomes the x  -axis, apply our known boost, then rotate it back.

To see how this works, we write the above matrix in terms of blocks, as follows (vectors in bold)

γ γ 𝜷 T γ𝜷 I+ γ1 𝜷 𝜷 T / β 2 .

In this same block notation, a three-dimensional rotation has the form

1 0 0 R

and its inverse is  1 0 0 R T .   If we choose R  such that Rβ  points along the x  -axis, then

1 0 0 R γ γ 𝜷 T γ𝜷 I+ γ1 𝜷 𝜷 T / β 2 1 0 0 R T = γ γ 𝜷 T R T γRβ I+ γ1 Rβ 𝜷 T R T / β 2

where Rβ= β 0 0 , 𝜷 T R T = β 0 0 ,   so, putting those in, that last fearsome-looking matrix is actually trivial, it’s just the boost along the x  -axis, as we want.

Exercise: By working out the matrix multiplication for a vector, A =MA,  taking A  in components parallel and perpendicular to the boost direction, prove that (Jackson page 526)

A 0 =γ A 0 𝜷A A =γ A β A 0 A = A .

Actually this is confusing: the matrix M  as written above operates on up index vectors. These look like down indices, but Jackson adds a footnote saying there are really elements of an up index (contravariant) vector… (The only difference would be the sign of the velocity, you should always check by looking at the low velocity limit.)

A Bit of Group Theory

The Lorentz boosts along the x -axis formed an Abelian (commutative) group, just as the set of rotations in a plane do.  The rotations in a plane are a subgroup of the group of three-dimensional rotations, which is of course non-abelian.  What about the set of all Lorentz boosts?  It turns out that this is not a group. A product of two Lorentz boosts in different directions is not just a Lorentz boost in some combined direction, it also has some rotation.  (We shall see the importance of this when we discuss the Thomas precession.)  The Lorentz group is the group of boosts plus rotations. Unfortunately, we do not have time to present the relevant group theory in terms of generators, etc., here.

Proper Time and Four-Velocity

Consider a spaceship going from one planet to another, the planets might have quite different velocities, so the distance covered by the ship will be different in the two planet rest frames.  One thing that won't be different is the time elapsed as measured by the crew of the spaceship.  This is called the proper time of the spaceship, the clock is always with the ship.  (Strictly, we’re assuming here that all frames are inertial. Otherwise, we need GR.)

An increment of proper time is denoted by dτ.  

If the spaceship moves Δ x μ  in time dτ,  this incremental displacement transforms as a Lorentz four-vector.  Therefore, so does

U μ = d x μ dτ .  

The four-vector U μ  is called the four-velocity.  In the nonrelativistic limit it becomes c, v i , the spatial part just the ordinary velocity, and τt, x 0 =ct.   

Now U μ U μ = d x μ d x μ dτ 2 ,  but d x μ d x μ  is just the interval, which has the same value in all frames, including the frame in the ship, where it is cdτ 2 ,  so

U μ U μ = c 2 .  

In the rest frame, where the incremental movement along the world line d x μ  is purely in the time direction, and is just cdτ,  the four-velocity is c,0,0,0 .   

In general, it's γc,γ v 1 ,γ v 2 ,γ v 3 .   (In relativity papers and books, these formulas usually appear with c=1.  )

Minkowski Diagrams: Axes and Scales in the Transformed Frame

Contrasting Ordinary Rotations and Lorentz Transformations

Obviously, for ordinary rotations in a plane, the change of axes and scales is trivial: the transformation is

x y = cosθ sinθ sinθ cosθ x y ,

 the axes are rotated and the measuring scale doesn’t change: x =1  is where the new axis intersects the invariant unit circle

x 2 + y 2 =1.

On the other hand, for Lorentz transformations, things are a little more complicated:  instead of the simple invariant circles x 2 + y 2 = R 2 , we evidently have invariant hyperbolae, c 2 t 2 + x 2 = a 2 , or c 2 t 2 + x 2 = b 2  ( a,b  real.)

The natural variable to describe these transformations is the rapidity ψ,  so

c t x = coshψ sinhψ sinhψ coshψ ct x =Λ ψ ct x .

and of course cosh 2 ψ sinh 2 ψ=1 .

Recall that tanhψ=0  for ψ=0,  and tanhψ±1  as ψ±.  

To see how the axes appear in the transformed frame, recall first that the lines ct=±x  must go to c t =± x , they constitute the two-dimensional version of the light cone.

This light cone invariance only works because there is one sign change in Λ ψ  compared with R θ , (look at the matrices above).

That sign change means that under the transformation the t , x  axes turn in opposite directions away from the original t,x  axes, so on going to larger and larger boosts, the axes close like scissors around the line x=t , never reaching it, of course.

This is easy to see from the equations:  the t -axis is the line x=0 , the t  axis is the line x =0 , or x=vt.  

Put another way: the t -axis is the “world line”, meaning the path in space time, of an object at rest at the origin in the original frame, the t -axis is the world line of an object at rest at the origin of the primed frame.

The x -axis is the line t =0,  so t=vx  in the original frame. 

So the primed frame axes are the original axes turned through opposite angles ±θ , tanθ=v=tanhψ.  This means that for small speeds, θ,v,ψ  are close, but as ψ  goes to infinity, θ  just approaches 45°.

These diagrams, called Minkowski diagrams first drawn by Minkowski a few years after Einstein published his special relativity paper. 

Finding Length and Time Scales in a New Frame:  Invariant Hyperbolae

We’ve now seen how the axes move, but we haven’t tracked what happens to the calibration the scale on the axes. 

The way to do that is to use an invariant hyperbola, for example c 2 t 2 + x 2 =1= c 2 t 2 + x 2 .

This hyperbola cuts the x  -axis at x=1 , and the x  axis at x =1.  Note that x =1  is the tangent line to the unit hyperbola x 2 c 2 t 2 =1 , it’s the minimum possible value of x  on that hyperbola.

Lorentz Contraction

Notice that from the diagram, in the x,ct  plane the point x =1  (on the hyperbola) is further from the origin than the point x=1.  Does this mean that a rod of unit length at rest in the primed frame (say, stretching from x =0  to x =1  ) will appear longer than unity in the x,ct  frame? 

Presumably not that would be the opposite of Lorentz contraction.

So what’s going on?  The essential point is that we’re looking at the x -positions of the ends of the rod at different times t.  To measure the length of a moving rod, we obviously need to find the x -values of the end points at the same time t  --keep reading.

World Lines

As we mentioned earlier, the world line of a particle (or of a small part of a solid object) is its path in four-dimensional spacetime.

Here are some sample world lines in a two-dimensional subspace.  First, the light cone sections are world lines of photons, traveling at c.   A particle moving at constant velocity x=vt  is shown. The world line must be steeper than the light cone, since v<c . A particle at rest has a world line parallel to the t  axis, so the t  axis itself is the world line of a particle at rest at the origin.

Now, back to measuring the moving rod. We need to plot the world lines of the two ends, and find how far apart they are at, say, t=0.  Look back at the original diagram. The world line of the left end is just that of the primed origin, that is, it’s the t  axis, x =0.   The other end is moving at the same velocity, so its world line has the same slope: it’s x =1,  the tangent line to the unit hyperbola x 2 c 2 t 2 =1  ,as mentioned earlier.

Plotting the two world lines, we can see that their simultaneous intercepts on the axis are at points less than one unit apart.

Exercise:  check your understanding by using similar arguments to show that a rod of unit length at rest in the original unprimed frame will have length measured as less than one in the primed frame.

Time Dilation

We’ve just seen how an invariant space-like hyperbola can explain how each observer can see the other as Lorentz contracted. A time-like invariant hyperbola can show us that each sees the other’s clock as running slow.

Here the invariant hyperbola is c 2 t 2 + x 2 =1= c 2 t 2 + x 2 :

The red parallel lines here are the lines c t =0  and c t =1 , both lines of simultaneity in the primed frame.

Suppose first that a clock in the unprimed frame flashes once a second.  The initial flash is seen by both observers to be at their common origin, ct=c t =0 . The second flash, at ct=1 , is clearly at c t >1  --the clock is running slow in the primed frame.

Now suppose a clock in the primed frame is flashing once a second.  As before, the initial flash is at the common origin. The next flash, at c t =1, x =0  is clearly at ct>1:

Look at the invariant hyperbolae and scale markings on this animation!


The acceleration four vector is defined as a =d U /dτ .  Notice that since the four velocity has constant magnitude U U = c 2 ,  the four acceleration is always orthogonal to the four velocity: U d U /dτ=0 , and so in the frame of the moving object the four acceleration has only spatial components.

The acceleration four vector can also be written a α = d U α dτ = d U α d x β d x β dτ = U β β U α .