Return to Chapter 6

Additional Material for Chapter 6

In Basic Universes with origins, Any Affine Transformation Equals a Linear Transformation Plus Immage of origin

Intuitively speaking,       F = L + F(O)

Let F be an affine transformation from a basic universe to a basic universe (both universes may be the same or different). Both universes have origins. Let O denote the origin in the first universe. The origin in the second universe is not denoted by anything. F(O) is the fixed image in the second universe (and need not be the origin there). Intuitively define a function L by

L(V) = F(V) -- F(O) for any point V in the first universe
Strictly speaking the subtraction of points has not actually been defined. Since the universes have origins, position vectors must be used in the real definition:
L(v) = F(v) -- F(o) for any position vector v
To show that L is a linear transformation.

To show first that   L(p + q) = L(p) + L(q) for any position vectors p,q
Construct the addition parallelogram for the vector sum of p, q, and let

s = p + q
Then O, P, S, Q are the vertices of this parallelogram in the first uniiverse.
Since F is affine, F(O), F(P), F(S), F(Q) are vertices of a (image) parallelogram in the second universe. Then segment [F(Q),F(S)] is parallel and congruent to segment [F(O),F(P)]. Therefore, F(s) -- F(q) = F(p) -- F(0). Rearranging,
F(s) = F(p) + F(q) -- F(0)
Therefore,
L(p + q) = L(s) = F(s) -- F(0) =   F(p) + F(q) -- F(0)   -- F(0) =   F(p) -- F(0)  +   F(q) -- F(0)   = L(p) + L(q)

Finally to show that   L(λp) = λL(p) for any position vector p and any real number λ
Notice first that L(O)=O' (the origin in the second universe) because L(O) = F(O) -- F(O) = O'. Therefore this part of the proof is true if λ = 0 or P = O. Assume P is not the origin O. Let Q be the point located by the position vector λp, that means the position vector q = λp. Then points O, P, Q are collinear (and distinct). And λ is also the proportionality factor

OQ = λOP
It is also the proportionality factor for the images F(O), F(P), F(Q) which like on a common line in the second universe.. Let O', P', Q' denote these images respectively. (Notice that O' need not be the origin in the second universe.) Since F is affine the same proportionality factor is true there:
O'Q' = λO'P'
Since there is an origin somewhere in the second universe,
q' -- o' = λp' -- o'
Then
L(λp) = L(q) = F(q) -- F(o) = q' -- o' = λ(p' -- o') = λ(F(p) -- F(o)) = λL(p)




Chain of Equalities for F(x,y) = (2x,3y) to be a Linear Transformation

   F(p + q) = F((x1,y1) + (x2,y2)) = F(x1+x2,   y1+y2) =
                                       (2(x1+x2), 3(y1+y2)) = (2x1,3y1) + (2x2,3y2) =
                                                                 F(x1, y1) + F(x2, y2) = F(p) + F(q)



Geometric approach to linear transformations

Since these linear universes are sets, there exist functions that carry all the points from a linear universe into a linear universe. These functions will be given some further conditions to become acceptable functions. For example, a function may carry all the points of a line onto a line. It is possible to find a function that carries an entire plane onto a single line. A function that carries a line onto a semi-circle would not be considered acceptable since a semi-circle is not one of the four linear universes. Another condition on acceptable functions is that the origin is carried by them onto the origin. A function that carries an entire linear universe onto a single point (it must be an origin) is called trivial or constant. Such functions are acceptable, but not very interesting.

One of the geometric figures that receives much attention in these discussions is the parallelogram. Only functions that preserve the form of a parallelogram will be acceptable. This means that if points O,P,R,Q are vertices of a parallelogram, then images O,F(P),F(R),F(Q) must also be vertices of a parallelogram. (See adjacent Fig1.) A slight problem occurs if F carries all of the points in the first linear universe onto points of some line. Then the four images are points of a "collapsed parallelogram". (See adjacent Fig2.). Click here to see a discussion of collapsed parallelograms.

The opposite sides of a parallelogram are parallel. This forces acceptable non-trivial functions to carry parallel lines onto parallel lines. (Assume that any line is parallel to itself.) Such functions are said to "preserve parallelism." There are functions in a higher geometry, called projective geometry, that carry lines onto lines but do not preserve parallelism.

There is another condition that "acceptable functions" must satisfy. Given any two parallel line segments. They may or not be congruent. However, a quotient of their lengths forms a ratio. An acceptable function will carry these parallel segments onto parallel segments preserving the ratio of their lengths. Therefore, if A and B are the end points of the first segment, and C and D are the end points of the second segment, then F(A) and F(B) are the end points of the image of the first segment, and F(C) and F(D) are the end points of the image of the second segment. Then if λ = length AB/lengthCD then also λ = length F(A) F(B)/length F(C) F(D) if F(C) and F(D) are distinct.

It should be emphasized that an acceptable function need not preserve lengths, only the ratio of lengths. It can carry a trapezoid onto a much larger or much smaller trapezoid, but the ratio of lengths of the parallel sides of each trapezoid must be the same. The ratio condition exists only if the segments are parallel. Parallel segments become congruent if the ratio of their lengths = 1.

[2.1] (Acceptable functions) A function is acceptable if it satisfies all of the following conditions:
  (a) It carries all of the points of some linear universe onto a linear universe;
  (b) It carries the orgin of the first linear universe onto the origin of the second linear universe;
  (c) If two line segments are parallel, then it carries them onto parallel line segments, providing they are not points. Furthermore, the ratio of the lengths of the segments in the first linear universe equals the ratio of the lengths of the segments in the second linear universe.
  (d) It carries the four vertices of a parallelogram in the first linear universe onto the four vertices of a parallelogram in the second linear universe.

The acceptable functions belong to "linear geometry". If condition (b) is removed, then the functions only belong to "affine geometry."

Some of these conditions are redundant. It is possible to prove (d) from (c).


The use of position vectors makes the discussion shorter and, hopefully, more understandable. Since linear universes contain an origin O, position vectors exist and any point may be located by a position vector: p = OP locates point P in some linear universe. Let F be a function from that linear universe onto a linear universe. Then F carries point P onto some point F(P) in the second linear universe and therefore F(p) = F(OP) = F(O)F(P) = OF(P) is a position vector from O to point F(P) and locates point F(P) in the second linear universe. It will be convenient in discussions to switch between points and their position vectors that locate the points. Therefore, F carries points onto points as well as their position vectors onto position vectors.

The sum of two position vectors p and q can be done in geometry using a parallelogram. The points Q, O, P are three adjacent vertices of the parallelogram. The fourth point S is located by the position vector OS which is a diagonal of the parallelogram. If F is an acceptable function then F carries the origin onto the origin and parallelograms onto parallelograms. Therefore, points F(Q),O,F(P) are three of the vertices of a parallelogram. The fourth point is is located by a position vector obtained by addition of vectors F(p) and F(q). But the acceptable function carries parallelogram onto parallelogram and therefore carries point S onto this fourth point. Therefore,

F(p) + F(q) = OF(S) = F(OS) = F(p + q)      for any position vectors p and q in the first linear universe
Therefore,
(*)      F(p + q) = F(p) + F(q)
This makes F an additive homomorphism. All additive homorphisms carry 0 onto 0. To show this, since 0 = 0 + 0, F(0) = F(0 + 0) = F(0) + F(0). Then subtract F(0) from the equation just found, F(0) = F(0) + F(0) to get 0 = F(0).

Let P' be any point on the line through distinct points O and P. Suppose O is not between P' and P as shown in the adjacent figure. This makes the ratio λ = OP'/OP non-negative. An acceptable function F carries these collinear points onto collinear points O, F(P') and F(P). Because F preserves ratios of lengths of line segments, λ = OF(P')/OF(P). Therefore for both figures,

OP' = λOP      and     OF(P') = λOF(P)
In the language of position vectors this translates into
p' = λp      and      F(p') = λF(p)
Replace p' in this last equation by λp to get
(*)     Fλp) = λF(p)

But the discussion has supported this equality only for λ>0. (It is trivially true for λ=0.) Some simple manipulations will show that the equation is true for λ<0 without direct geometric support.
Since F is a homomorphism,

F(λp + (-λ)p) = F(λp) + F((-λ)p)
Therefore,
0 = F(0) = F(λp) + F((-λ)p)
This means that
F(λp) = - F((-λ)p)
But   - λ >0  , so from the argument above, F((-λ)p) = (-λ)F(p). Therefore,
F(λp) = -(-λ)F(p)
or
(**)      F(λp) = λF(p)
Equations (*) and (**) become necessary conditions that a function F be a linear transformation.



Linear Expressions Relate Matrices and Corresponding Functions

It is simple to find values for the linear expressions
(*)      5x + 6y    and    8x - 7y
if numerical values are given to x and y. If x and y are given values 3 and 4 respectively, then the expressions become
5(3) + 6(4) = 39    and    8(3) - 7(4) = - 4
By giving other values to x and y simultaneously, the two expressions have numerical values. Form the horizontal arrays (3,4)'and (39,-4) . To the first array the linear expressions associate the second array. This action can be written
(3,4) --> (39,-4)
The expressions can also be applied to vertical arrays:
(3,4)' --> (39,-4)
The reader can easily verify:
(1,-1)' --> (-1,15)',   (4,1)' --> (26,25)',   (α,β)' --> (5α + 6β, 8α - 7β)'
Consider these arrays as locating points in a plane. For each point (x,y)' in the plane the expressions assign a unique point (5x+6y,8x-7y)'. But this action satisfies the definition of a function F, carrying points in a plane onto points in the same plane:
(**)      F(x,y) = (5x+6y,8x-7y)
Here,
F(3,4) = (39,-4),   F(1,-1) = (-1,15),   F(4,1) = (26,25)
It is convenient to use the same function F for vertical arrays:
F(3,4)' = (39,-4)',   F(1,-1)' = (-1,15)',   F(4,1)' = (26,25)'
Using the coefficients of the linear expressions (*), form a 2x2 matrix:
Thinking of M (x,y)' as a matrix product of a 2x2 and a 2x1 matrices:
Compare this equation involving matrices with the definition (**) of function F, intuitively speaking, M and F do the same thing to points in the plane. More exactly, M and F perform the same action on the coordinates of every point in the plane. The only difference is that by notation F involes horizontal arrays, but M involves vertical arrays.


There is a similar discussion for 3x3 matrices and points in space. Given the linear expressions,
x + 2y + 3z,   4x + 5y + 6z,   7x + 8y + 9z
the function F can be formed: F(x,y,z) = (x+2y+3z, 4x+5y+6z, 7x+8y+9z) and the matrix M that corresponds to F is:
The reader can verify that the product   M(x,y,z)'   =   (x+2y+3z, 4x+5y+6z, 7x+8y+9z)'.


The linear expressions
2x + 3y    5x - 2y    -4x + y
receive values for x and y and return values for the three expressions. Therefore, they carry points from a plane into space:
(x,y)' --> (2x+3y, 5x-2y, -4x+y)'
The function for this action is F defined by F(x,y) = (2x+3y, 5x-2y, -4x+y).   The matrix is
For example, Let M act on point (6,7)' and let F act on the same point (6,7):
M(6,7)' = (2(6)+3(7), 5(6)-2(7), -4(6)+1(7))' = (33, 14, -17)'
F(6,7) = (33, 14, -17).
By the matrix product a non-square matrix "changes" a column array of some length into a column array of a different length. The corresponding function carries an array of some length onto an array of a different length. Intuitively speaking, they both carry points in some dimension onto points in another dimension.



Products of associated matrices are associated with products of linear transformations

The discussion involves only the plane as the linear universe. For other linear universes, the discussions are similar.
Given: Two linear transformations F and G from the plane into the plane with associated matrices M and N respectively. To show that the product NM of the matrices is the associated matrix of the product (composition) GF. G and F carry the special arrays (1,0) and (0,1) onto arrays of length two, say
G(1,0) = (γ1, γ2),   G(0,1) = (δ1, δ2),      F(1,0) = (α1, α2),   F(0,1) = (β1, β2)
The images of the special arrays become vertical columns in the associated matrices N and M:
Now
GF(1,0) = G(α1, α2)
             = G(α1(1,0) + α2(0,1))
             = α1G(1,0) + α2G(0,1)
             = α11, γ2) + α21, δ2)
             = (α1γ1 + α2δ1, α1γ2 + α2δ1)
             = first column of NM

Similarly,
GF(0,1) = G(β1, β2)
             = ...
             = ...
             = ...
             = (β1γ1 + β2δ1, β1γ2 + β2δ2)
             = second column of NM

This makes NM the matrix associated with GF.