At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.
as luck would have it i am teaching that course right now, and we just reached that section. please let me try my version of that stuff on you. there are two concepts, 1) vector space. 2) a linear map from one vector space to another. for example, R^2 is a vector space, and projection of R^2 onto the x axis is a linear map from R^2 to R^2. Also rotation of R^2, through 60 degrees counterclockwise, is a linear map from R^2 to R^2. (Any mapping of R^2 to R^2 that takes parallelograms to [maybe flattened] parallelograms is a linear map. e.g. projection onto a line through (0,0), rotation about (0,0), or reflection in a line through (0,0).) given a linear map L:R^2-->R^2, we want to know two things: 1) which vectors in the target are actually hit by the map? I.e. for which vectors b can we solve the equation L(X) = b? 2) If b is a vector for which L(X) = b has solutions, what are all those solutions X? for example, if L:R^2-->R^2 is projection on the x axis, then only those vectors b which lie on the x axis can be solved for in L(X)=b. and given such a vector b on the x axis, the solutions X of L(X) = b, consist of exactly all X's on the vertical line through b and perpendicular to the x axis. The set of solutions X of the equation L(X) = 0, is called the null space of L. it is interesting because of the general principle, that the general solution to the equation L(X) = b, consists of any particular solution u such that L(u) = b, plus a general solution v of the equation L(X) = 0. I.e. if L(u) = b and L(v) = 0, then L(u+v) = b also. thus if we know all solutions of L(X) = 0, to find all solutions of L(X) = b, we only to know one solution. Thus given any lienat map L:R^2-->R^2, there are two important subspaces: 1) Im(L) = "image of L" = those b such that L(X) = b has a solution X. 2) N(L) = "nullspace of L" = those vectors v such that L(v) = 0. But since we can always form the space perpendicular to any given space, these two important subspaces give rise also to two more spaces, the spaces perpendicular to the two given ones. N(L)perp, and Im(L)perp. Now a linear map L:R^2-->R^2 is always given by multiplying by some unique matrix, say A. I.e. given a linear map L, there is a matrix A such that L(X) = AX for all X. Then notice that L(X) = 0 means simply that AX= 0, and if you know about dot products and matyrix multiplication, this means that X is perpendicupar to the rows of the matrix A. So in fact, N(L) = the space perpendicular to the row space of A, i.e. N(L) = R(A)perp. We also know that the product AX is simply the linear combination of the columns of A with coefficients from X, so only vectors which are linear combinations of the columns can have form AX = b. Thus Im(L) = the column space C(A) of A. Moreover, the equations which vanish on Im(L) are therefore the space C(A)perp. Hence we have for each map L:R^n-->R^m, a matrix A such that L(X) = AX for all X. Then R(A)perp = N(L) = the solutions of L(X) = 0. C(A) = Im(L) = those b such that L(X) = b has a solution X. C(A)perp = the equations that tell you which b can be solved for in L(X) = b. hows that?
Then direct calculation shows that it is not diagonalizable. Why do we care about diagonalizing matrices? The main reason is ease of computation. If we can write A = S L S-1, then A1000 = S L1000 S-1, and the calculations can be performed very quickly. If we had to multiply 1,000 powers of A, this would be very time consuming. Theoretically, we may not need such a time-saving method, but if we’re trying to model any physical system or economic model, we’re going to want to run calculations on a computer. And if the matrix is decently sized, very quickly these calculations will cause noticeable time-lags. Jordan Canonical Form is the answer. The Question? What is the ‘nicest’ form we can get an arbitrary matrix into. We already know that, to every eigenvalue, there is a corresponding eigenvector. If an nxn matrix has n linearly independent eigenvectors, then it is diagonalizable. Hence, Theorem 1: If an nxn matrix A has n distinct eigenvalues, then A is diagonalizable, and for the diagonalizing matrix S we can take the columns to be the n eigenvectors (S-1 A S = L). In the proof of the above, we see all we needed was n linearly independent vectors. So we obtain Theorem 2: If an nxn matrix A has n linearly independent eigenvectors, then A is diagonalizable, and for the diagonalizing matrix S we can take the columns to be the n eigenvectors (S-1 A S = L). Now consider the case of an nxn matrix A that does not have n linearly independent eigenvectors. Then we have Theorem 3: If an nxn matrix does not have n linearly independent eigenvectors, then A is not diagonalizable. Proof:Assume A is diagonalizable by the matrix S. Then S-1 A S = L, or A = S L S-1. The standard basis vectors e1, ..., en are eigenvectors of L, and as S is invertible, we get Se1, ..., Sen are eigenvectors of A, and these n vectors are linearly independent. (Why?) But this contradicts the fact that A does not have n linearly independent eigenvectors. Contradiction, hence A is not diagonalizable. So, in studying what can be done to an arbitrary nxn matrix, we need only study matrices that do not have n linearly independent eigenvectors. Jordan Canonical Form Theorem (JCF): Let A be an nxn matrix. Then there exists an invertible matrix M such that M-1 A M = J, where J is a block diagonal matrix, and each block is of the form (l 1 ) ( l 1 ) ( l 1 ) ( . . . ) ( l 1) ( l) Note J1000 is much easier to computer than A1000. In fact, there is an explicit formula for J1000 if you know the eigenvalues and the sizes of each block. II. Notation: Recall that l is an eigenvalue of A if Det(A - lI) = 0, and v is an eigenvector of A with eigenvalue l if Av = lv. We say v is a generalized eigenvector of A with eigenvalue l if there is some number N such that (A-lI)N v = 0. Note all eigenvectors are generalized eigenvectors. For notational convenience, we write gev for generalized eigenvector, or l-gev for generalized eigenvector corresponding to l. We say the l-Eigenspace of A is the subspace spanned by the eigenvectors of A that have eigenvalue l. Note that this is a subspace, for if v and w are eigenvectors with eigenvalue l, then av + bw is an eigenvector with eigenvalue l. We define the l-Generalized Eigenspace of A to be the subspace of vectors killed by some power of (A-lI). Again, note that this is a subspace. III. Needed Theorems: Fundamental Theorem of Algebra: Any polynomial with complex coefficients of degree n has n complex roots (not necessarily distinct). Cayley-Hamilton Theorem: Let p(l) = Det(A-lI) be the characteristic polynomial of A. Let l1, ..., lk be the distinct roots of this polynomial, with multiplicities n1, ...., nk (so n1 + ... + nk = n). Then we can factor p(l) as p(l) = (l - l1) n1 (l - l2) n2 * ... * (l - lk) nk, and the matrix A satisfies p(A) = (A - l1I) n1 (A - l2I) n2 * ... * (A - lkI) nk = 0, Schur’s Lemma (Triangularization Lemma): Let A be an nxn matrix. Then there exists a unitary U such that U-1 A U = T, where T is an upper triangular matrix. Proof: construct U by fixing one column at a time. IV. Reduction to Simpler Cases: In the rest of this handout, we will always assume A has eigenvalues l1, ..., lk, with multiplicities n1, ...., nk (so n1 + ... + nk = n). We will show that we can find n1 l-gev, n2 l-gev, ..., nk l-gev, such that these n vectors are linearly independent (LI). These will then form our matrix M. So, if we can show that the n generalized eigenvectors are linearly independent, and that each one ‘block diagonalizes’ where it should, it is enough to study each l separately. For example, we’ll show it’s sufficient to consider l = 0. Let l be an eigenvalue of A. Then if vj is a generalized eigenvector of A with eigenvalue l, then vj is a generalized eigenvector with eigenvalue 0 of B = A - lI: A vj = l vj + vj-1 à B vj = 0 vj + vj-1. So, if we can find nj LinIndep gev for B corresponding to 0, we’ve found nj LinIindep gev for A corresponding to l. The next simplification is that if we can find nj LinIndep gev for U-1 B U, then we’ve found nj LinIndep gev for B. The proof is a straightforward calculation: let v1, ..., vm be the m LinIndp gev for U-1 B U; then U-1 v1, ...., U-1 vm will be m LinIndep gev for B. Lemma 4: Let p(l) = (l - l1) n1 (l - l2) n2 * ... * (l - lk) nk be the char poly of A, so p(A) = (A - l1I) n1 (A - l2I) n2 * ... * (A - lkI) nk. For 1 £ i £ k, consider (A - liI). This matrix has exactly ni LinIndep generalized eigenvectors with eigenvalue 0, hence A has ni LinIndep generalized eigenvectors with eigenvalue li. Proof: For notational simplicity, we’ll prove this for l = l1, and let’s write m for the multiplicity of l (so m = n1). Further, by the above arguments we see it is sufficient to consider the case l = 0. By the Triangularization Lemma, we can put B = A - lI (which has first eigenvalue = 0) into upper triangular form. What we need from the proof is that if we take the first column of U to be v, where v is an eigenvector of B corresponding to eigenvalue 0, then the first column of T = U1-1 B U1 would be (0,0, ..., 0)T. The lower (n-1)x(n-1) block of Tn, call it Cn-1, is upper triangular, hence the eigenvalues of B appear as the entries on the main diagonal. Hence we can again apply the triangularization argument to Cn-1, and get an (n-1)x(n-1) unitary matrix U2b, such that U2b-1 Cn-1 U2b = Tn-1 has first column (0,0,....0), and the rest is upper triangular. Hence we can form an nxn unitary matrix U2 (1 0 0 ... 0) (0 ) (0 U2b ) (... ) (0 ) Then U2-1 U1-1 B U1 U2 = (0 * * ... *) (0 0 * ... *) (0 0 * ... *) (... ... ... ... ) (0 0 0 ... *) The net result is that we’ve now rearranged our matrix so that the first two entries on the main diagonal are zero. By ‘triangularizing’ like this m times, we can continue so that the upper mxm block is upper triangular, with zeros along the main diagonal, and the remaining entries on the main diagonal are non-zero (as we are assuming the multiplicity was m). Call this matrix Tm. Note there is a unitary U such that Tm = U-1BU. Remember, Tm and B are nxn matrices, not mxm matrices. Sublemma 1: At most m vectors can be killed by powers of Tm. Proof: direct calculation: When we multiply powers of Tm, we still have an upper triangular matrix. The entries on the main diagonal are zero for the first m terms, and then non-zero for the remaining terms (because the multiplicity of the eigenvalue l = 0 is exactly m). Hence the vectors em+1, em+2, ..., en are not killed by powers of Tm, and so powers of Tm can have a nullspace of dimension at most m. We now show that exactly m vectors are killed by Tm. This follows immediately from Sublemma 2: Let C be an mxm upper triangular matrix with zeros along the main diagonal. Then Cm is the zero matrix. Proof: straightforward calculation, left to the reader. Hence the nullspace of (Tm)m (and all higher powers) is exactly m, which proves there are m generalized eigenvectors of B with eigenvalue l = 0. These vectors are LinIndep: As Tm is upper triagonal with zeros on the main diagonal for the first m entries, Tm has m LinIndep gev e1, …, em with eigenvalue 0. As B = UTmU-1, B has m LinIndep gev Ue1, …, Uem with eigenvalue 0 (show that B cannot have any more LinIndep gev with l = 0). Returning to the proof of Lemma 4, we see that there are exactly n1 vectors killed by (A-l1I)n1, ...., nk vectors killed by (A-lkI)nk. The only reason we go thru this triagonalizing is to conclude that there are exactly ni vectors killed by (A-liI)ni. Try to prove this fact directly! IV. Appendix: Representation of l-Generalized Eigenvectors. We know that if l is an eigenvalue with multiplicity m, there are m generalized eigenvectors, satisfying (A - lI)m v = 0. We describe a very useful way to write these eigenvectors. Let us assume there are t eigenvectors, say v1, …., vt. We know there are m l-gev. Note if v is a l-gev, so is (A-lI)v, (A-lI)2 v, …., (A-lI)m v. Of course, some of these may be the zero vector. We claim that each eigenvector is the termination of some chain of l-gev. In particular, we have (A-lI) v1,a = v1,a-1 (A-lI) v1,a-1 = v1,a-2 ... (A-lI) v1 = 0 where v1 = v1,1. and (A-lI) v2,b = v2,b-1 (A-lI) v2,b-1 = v2,b-2 ... (A-lI) v2 = 0 where v2 = v2,1. all the way down to (A-lI) vt,r = vt,r-1 (A-lI) vt,r-1 = vt,r-2 ... (A-lI) vt = 0 where vt = vt,1, and a + b + …+ r = m. We emphasize that we have not shown that such a sequence of l-gev exists. Later we shall show how to construct these vectors, and then in Lemma 8 we will prove they are Linearly Independent. For now, we assume their existence (and linear independence), and complete the proof of Jordan Canonical Form. Let us say a l-gev is a pure-gev if it is not an eigenvector. Thus, in the above we have t eigenvectors, and m-t pure-generalized eigenvectors. For notational convenience, we often label the l-generalized eigenvectors by v1, …, vm. Thus, for a given j, we have (A-lI)vj = 0 if vj is an eigenvector, and (A-lI)vj = vj-1 if vj is a pure-gev. V. Linear Independence of the l-Generalized Eigenspaces. Assume the n1 gev corresponding to l1 are linearly independent amongst themselves, and the same for the n2 gev corresponding to l2, .... We now show that the n gev are linearly independent. This fact complete the proof of Jordan Canonical Form (of course, we still must prove the ni li-gev are linearly independent). Assume we have some linear combination of the n gev equaling zero. By LC li-gev we mean a linear combination of the ni li-gev. (This is just to simplify notation). Then (LC l1-gev) + (LC l2-gev) + ... + (LC lk-gev) = 0. We’ll show first that the coefficients in the first linear combination are all zero. Recall the characteristic polynomial is p(A) = (A - l1I) n1 (A - l2I) n2 * ... * (A - lkI) nk. Define g1(A) = (A - l2I) n2 (A - l3I) n3 * ... * (A - lkI) nk, g1(A) kills (LC l2-gev), g1(A) kills (LC l3-gev),...., g1(A) kills (LC lk-gev). Why? For example, for the l2-gev, they are all killed by (A-l2I)n2, and hence as g1(A) contains this factor, they are all killed. What does g1(A) do to (LC l1-gev)? Again, for notational simplicity we’ll write m for n1, and v1, ..., vm for the corresponding m l1-gev. We can look at it factor by factor, as all the different terms (A - liI) commute. Lemma 5: For i > 1, let the vj’s be the gev corresponding to l1. If vj is a pure-gev, then (A - liI) vj = (l1 - li) vj + vj-1 If vj is an eigenvector, then (A - liI) vj = (l1 - li) vj . Again, the proof is a calculation: if vj is a pure-gev, (A - liI) vj = (A - l1I + l1I - liI) vj = (A - l1I) vj + (l1I - liI) vj = vj-1 + (l1 - li) vj The proof when vj is an eigenvector is similar. Now we examine g1(A) ((LC l1-gev) + (LC l2-gev) + ... + (LC lk-gev)) = 0. Clearly g1(A) kills the last k-1 linear combinations, and we are left with g1(A) (LC l1-gev) = 0 Let’s say the LC l1-gev = a1 v1 + ... + am vm. We need to show that all the aj’s are zero. (Remember we are assuming the vj’s are linear independent – we will prove this fact when we construct the vj’s). Assume am ¹ 0. From our labeling, vm is either an eigenvector, or a pure-gev that starts a chain leading to an eigenvector: vm, (A - liI) vm, (A - liI)2 vm, …. Note no other chain will contain vm. We claim that g1(A) (LC l1-gev) will contain a non-zero multiple of vm. Why? When each factor (A - liI) hits a vj, one gets back (l1 - li) vj + vj-1 if vj is not an eigenvector, and (l1 - li) vj if vj is an eigenvector. Regardless, we always get back a non-zero multiple of vj, as l1 ≠ li. Hence direct calculation shows the coefficient of vm in g1(A) (LC l1-gev) is am (l1 - l2)n2 (l1 - l3)n3 * ... * (l1 - lk)nk As we are assuming the different l’s are distinct (remember we grouped the eigenvalues together to have multiplicity), this coefficient is non-zero. As we are assuming v1 thru vm are linearly independent, the only way the coefficient of vm can be zero is if am = 0. Similar reasoning implies am-1 = 0, and so on. Hence we have proved: Theorem 5: Assuming that the ni generalized eigenvectors associated to the eigenvalue li are linearly independent (for 1 £ i £ n), then the n generalized eigenvectors are linearly independent. Furthermore, there is an invertible M such that M-1 A M = J. The only item not immediately clear is what M is. As an exercise, show that one may take M to be the generalized eigenvectors of A. They must be put in a special order. For example, one may group all the l1-gev together, the l2-gev together, and so on. For each i, order the li-gev as follows: say there are t eigenvectors which give sequences v1, …, v1,a, v2, …, v2,b,…, vt,…,vt,r. Then this ordering works (exercise). VI. Finding the l-gev: The above arguments show we need only find the ni generalized eigenvectors corresponding to the eigenvalue li; these will be of the form (A-liI) vj = 0 or (A-liI) vj = vj-1. Moreover, we’ve also seen we may take li = 0 without loss of generality. For notational convenience, we write l for li and m for ni. So we assume the multiplicity of l = 0 to be m. Hence in the sequel we show how to find m generalized eigenvectors of an mxm matrix whose mth power vanishes. (By the triangularizing we’re done, finding m such generalized eigenvectors for this is equivalent to finding m generalized eigenvectors for the original nxn matrix A).
this is an explanation
Why thank you for the lesson :) I get it now