Educação matemática pela arte
Gusmão, Lucimar Donizete
2013-08-28
Publisher
Date
1976-11
Description
A system is conceived of as being slowly varying if it changes slowly enough to permit identification to within a specified error. A generic model is developed to study the identifiability and identification of slowly varying systems. The model is suitable for a large variety of nonlinear, time-varying, causal, bounded memory systems; it has finitely many parameters and is linear in its parameters. Results are obtained with the use of this general model that give guaranteed accuracy of identification as a function of the prior knowledge of the unknown system, the maximum rate of time variation of the system, and the characteristics of output observation noise. To derive these results, a recursive estimation procedure is developed for time-discrete linear dynamical system structures in which the observation noise is statistical but the dynamic equation noise is nonstatistical and is known only to be bounded.
Subject
Identifier
Information and Control
Fiske, Philip H., Root, William L. (1976/11)."Identifiability of slowly varying systems." Information and Control 32(3): 201-230.
Language
Rights
Show preview
Hide preview
INFORMATION AND CONTROL 32, 201-230 (1976)
Identifiability of Slowly Varying Systems*
PHILIP H. FISKE
The Analytic Sciences Corporation, Reading, Massachusetts 01867
AND
WILLIAM L. ROOT
Aerospace Engineering Department, The University of Michigan, Ann Arbor, Michigan 48109
A system is conceived of as being slowly varying if it changes slowly enough to permit identification to within a specified error. A generic model is developed to study the identifiability and identification of slowly varying systems. The model is suitable for a large variety of nonlinear, time-varying, causal, bounded memory systems; it has finitely many parameters and is linear in its parameters. Results are obtained with the use of this general model that give guaranteed accuracy of identification as a function of the prior knowledge of the unknown system, the maximum rate of time variation of the system, and the characteristics of output observation noise. To derive these results, a recursive estimation procedure is developed for time-discrete linear dynamical system structures in which the observation noise is statistical but the dynamic equation noise is nonstatistical and is known only to be bounded.
1. INTRODUCTION
Our purpose in this paper is to investigate the identifiability of systems that are changing slowly with time in an unpredictable way. Except in trivial situations, if a system is not time-invariant, a finite-time record of input and output data cannot by itself yield a precise identification of the system, even as it existed during the period the data were taken. The response to the one particular input or sequence of inputs used will be known, but what the response would have been to other inputs cannot be found in general. Even less can be known about the future behavior of the system, unless there is auxiliary information. However, it is reasonable and it is
* Research supported by Air Force Office of Scientific Research, Air Force Systems Command, USAF, under AFOSR Grant Number 72-2328.
201 Copyright © 1976 by Academic Press, Inc. All rights of reproduction in any form reserved.
202 FISKE AND ROOT
common practice to try to identify the unknown systems that are slowly time-varying as if they were time-invariant, updating the identification from time to time in order to track the changes. Such an approach is valid if the system is changing slowly enough to permit sufficiently good approxi- mate identification for the purpose at hand. The problem we address is: How fast can the system change and still permit satisfactory identification from input-output data ? The answer to this question depends of course on the criterion of satisfactory identifiability. It also depends on the extent of prior knowledge of the system (e.g., is the system known to be linear ?) and on the characteristics and state of prior knowledge of observation noise.
Identifiability can be studied in terms of any system model that is capable of providing a sufficiently faithful representation of the unknown system in question. We choose a particular kind of generic model (i.e., a model with various undetermined parameters) that has the properties of being extremely flexible and of being linear in the parameters. The flexibility of the model makes it applicable to a wide class of systems, nonlinear as well as linear. The linearity in the parameters means that the identification is always a linear estimation problem, which is advantageous when general error formulas are desired that will show the interplay among the factors that control the identifiability. The essential restrictions on generality imposed by the model structure we use are: (1) Input observations must be noise-free, and (2) the unknown system must have finite memory less than or equal to some known bound. Since the theory is an approximation theory, it is not critical in the final interpretation that these conditions be exactly satisfied; however, the mathematical analysis is based on these assumptions.
The generic model is developed in the next two sections. It is based on the notions of classes of systems and their e-representations. In Section 3 a theorem is stated about the identifiability of time-varying systems with noise-free observations of output. This is a relatively simple result which is preparatory to the main identifiability theorem, Theorem 5 of Section 5.
A recursive statistical estimation procedure is developed in Section 4, which is also preparatory to Theorem 5. This procedure applies to a mixed linear model with both stochastic and unknown-but-bounded terms, and is new as far as we are aware. Since the estimation problem is perhaps of some interest in itself, we have treated it a little more generally than is necessary for the application in Section 5.
Some of the results of this paper were presented in the conference papers (Fiske and Root, 1974; Fiske, 1975a); see also Fiske, (1975b), where there is further related material.
SLOWLY VARYING SYSTEMS 203
2. PRELIMINARIES CONCERNING e-REPRESENTATIONS
The concept of e-representation is introduced and developed somewhat by Root (1971, 1975a); the use of e-representations in identification is discussed abstractly by Root (1973). For the convenience of the reader a summary of those definitions and results concerning e-representations which are needed is provided here.
A class of bounded systems is the 4-tuple 50 = (~/, f, O, ~), where ~ is a Banach space, O and 2~ are metric spaces, and f is a continuous map from the topological product O × ~ into Yg, bounded on 5~. W, 0, and 5F are called the output space, system parameter space, and input space, respectively. If ~ E 0, s = (Y/, f, ~, 5~) is a system belonging to 5 °. The equation y = f(a, x), x ~X, describes the transformation of inputs to outputs for the system s.
A class of systems 50 is prelinear if 6g is a subset of a linear space and
f(ca~l -k c2~2, x) : clf(~Xl , X) @ c2f(~x2, X)
for all x c W whenever the scalars c 1 , c 2 and the parameters ~1, c~2 are such that all three terms are defined. Note that the systems in a prelinear class need not have any linearity properties.
Let ~" = ~-(W, ~J) denote the set of all bounded continuous maps from into ~J made into a Banach space in the standard way (see, e.g., Dieudonn6,
1960); i.e., if F E Y , [] F ]l is defined by [] F ]l = sups. [I F(x)ll, where the norm on the right is the norm in ~J. Define g: ~ × W --+ ~ by g(F, x) = F(x). Then g is continuous on ~ × 5~ and actually linear in ~-. Thus 50~ = (~, g, ~ , W) is a linear class of bounded systems.
Note that f(~, ") defines an element, call it HE, of J ( f , ~). Let ¢ be the map from O into ~(2~, Y/) defined by ¢(~) = HE, and put ~f' =a ¢(0). Then 500 = (~/, g, g/t°, YY) is a class of systems, and is a subclass of 50o~. 500 is called the natural representation of 50 = (~, f, 0, ~) and ~b the natural mapping for the class 50. 500 is always a prelinear class, whether 50 is or not. I f 50 is itself a prelinear class then it follows that ¢ is a restriction of a linear map from the linear span of O into ~. I f ~b is injective and O and ~ are compact metric spaces, then 50 is equivalent to its natural representation 500 (see Root, 1975a, for definition and details) and nothing is lost by con- sidering 500 directly.
Let ~ = (Y/, f l , O1,5~) be another class of bounded systems with natural mapping ~b 1 . I f there exists a map ~1 from 9f' into O 1 such that [l H - - ~b 1 o ¢I(H)I[ ~< e for all H ~ ~t', (501, ¢1) is said to be an e-representation of ~0 (and also to be an e-representation of any class for which ~0 is the
204 FISKE AND ROOT
natural representation). The definition does not require that ~b 1 o¢1(H ) necessarily belong to W. It does require that 50 and 5:1 have the same input and output spaces. An e-representation (5:1, ¢1) of -5: is linear if 5:1 is a prelinear class and ¢1 is a restriction of a linear mapping; it is finite- dimensional if 51 is a subset of a finite-dimensional Euclidean space; it is continuous if both ¢1 and ¢1 are continuous; it is determined by ~o, Y'o C ~, if the map ¢1 depends only on the functions H, H E W, restricted to ~0 • The basic existence theorem is the following.
THEOREM (Root (1975a)). Let ~/ be a Banach space and 5 and ~ be compact metric spaces. Let 5: = (~, f, 5 , f ) be a class of bounded systems. Given e > O, there exists a continuous, finite-dimensional e-representation (5:1, ¢1) of 5: that is determined by a finite subset fo C Y:. 5:1 is a prelinear class. I f ~t is a Hilbert space, then in addition (~1 can always be a linear map so that (5:1, ¢1) is a linear e-representation. 1
The proof of this theorem is by construction and the formulas given by the construction are needed here; so some further discussion is necessary. It is shown that one can find a finite-dimensional subspace ~ ' of ~' that has a subset arbitrarily close to the total image set Y{( f ) C Y/. A continuous map ~r from ~ into ~/' is devised that carries points in d/f (W) into sufficiently nearby points of ~t,. In case ~' is a Hilbert space, ~r can be taken to be the orthogonal projection on ~ ' , and this is the only case we consider. Letting y ' = ~ry, y ' is expressed in terms of an arbitrary basis {e I .... , eu} of ~/'
M y '= ~, e,*(y ' )e , , (2.1)
i=l
where the ei* are continuous linear functionals. Now if the determining set f0 = {xl .... , xx} , the parameter space 51 of the c-representation is a subset of RuM. The map ¢1: ~- -+ 51 is defined as follows. ¢1(H) = h, where h is that point in R gu with coordinates, with respect to an arbitrary ortho- normal basis, given by
am~ = e~* o 7r o H(x,), m = 1 .... , M; n = 1 ..... N. (2.2)
A set of continuous interpolation functionals {yl(x),..., ylv(X)}, x e f , is constructed. These functionals depend on the set {x 1 ,..., X2v}, and have the properties
1 It can happen that ¢1 is l inear when ~d is only a Banach space, but it is not known to us whether this can be guaranteed.
SLOWLY VARY ING SYSTEMS 205
(1) 7.(x) ~0, n= 1 ..... N, xE~, N
(2) Z,,=, y,~(x) -= 1, x e W,
(3) 7,,~(x~) = 8,~, n, k = 1,..., N .
Finally, the map f , is defined by
f , (h, x) = 7,~(x) am %. (2.3) m=l n=l
An e-representation of the form just described is called a standard e-repre- sentation. It depends on the set {xl ,..., xN} , and that set is called its determining set.
The integers M and N depend on f , dr' (or 0/) and e, but are not uniquely defined simply by the condition that (~1, ¢1) is a standard C-representation. However, given f , 3/d, and e there is a smallest N such that there exists a standard e-representation with a determining set with N elements (we are not concerned with the size of M). With fixed ~0 and W this minimum N is nondecreasing as e is made smaller; with fixed ¢ the minimal N is nondecreasing as either f or 35 ° is made larger in the sense of set inclusion.
Two simple preparatory lemmas are needed that are not given in the references.
LEMMA 1. For a standard e-representation with ~ a Hilbert space and {era}, m -~ 1 .... , M, an orthonormal set, ~ is a restriction of a linear map from o~(97, ~) -+ R NM with linear operator norm ~N1/2.
Proof. Let R, M be M-dimensional Euclidean space with basis {e, , . . . , eM} , and let the image of 3(¢ ~ given x~ under the mapping defined by (2.2) lie in R~ M. Let R MN = R1 M @ "'" (~) R~ M, and assign the basis {emn}, m = 1,..., M; n = 1,..., N, where em~ is the element of R MN corresponding to e~ in R, M. Then with respect to this basis, let h = ~,(H), h' ---- q~,(H') have coordinates {a~}, {a~} in R NM, respectively. From (2.2),
(a,~ - - a~,~) % II H(x.) - H'(x.)ll ~
II H -- H ' II ~, n = 1,..., N .
Hence,
llh - - h'l] ~ ~ NIl H - - H'[[~
from which the assertion follows. |
206 FISKE AND ROOT
LEMMA 2. With the same hypotheses as for Lemma 1, ¢~ is a restriction of a linear map from R NM -+ ~-(£r ~) with linear operator norm bounded by one.
Proof. Because of the proliferation of norms in different spaces which appear, subscripts are used with the norms initially, except for the linear operator norm of ~b 1 which is denoted 1 ¢1 ]. Then,
I ¢1 I = sup II ¢1(h)1l R~ 11 h II~NM
= sup sup IlA(h, x)lle~ RNM .~
= sup s 7.(x) a~. R NM u.vP ~a=l 1
,} em ] R M
~< sup 1 sup [ a,~,~ [ R NM "~" \~'t=l
~< sup 1 sup (x) • max [am~ RNM X 1 n
l = sup I R NM N
because ~,~=1 y~(x) = 1 for all x ~ :T. Then
E~=l max. l a~. I¢11 ~< sup EM ~.N a ~ " ~<1. |
3. A MODEL FOR TIME-VARYING SYSTEMS
Some further notations, conventions, and definitions are useful. In what follows we deal with spaces of functions (or equivalence classes of functions) on R 1 or subsets of R1; in each case the functions take values in a finite- dimensional Euclidean space. L2(a, b) denotes the L2-space formed from functions defined on (a, b) and square integrable Lebesgue. We shall identify the spaces L2(a, b) and L~(t + a, t + b), t any real number, without further comment. The symbol A will sometimes be used as a generic symbol for
SLOWLY VARYING SYSTEMS 207
an interval in R 1 (of the form (a, b], if the endpoints matter). The linear operators L~, Pa and P~ are defined on functions x on R 1 by
(Lax)(t) ~- x(t + a),
(Pox)(t) --- x(t), t ~< a,
-= 0, t ~ a,
(P~x)(t) -~ x(t), t ~ A,
=0, t~A.
The symbol W e denotes a set of functions x on R 1 that satisfies the conditions
(i) (shift invariance) x ~ fe ~ Lax ~ We for all a ~ R1;
(ii) (projection property) x ~ W e ~ Pax ~ We , ( I - - Pa)x ~ f ~ and Pax ~ fe for all a and A.
To avoid confusion between different uses of the word system, we introduce the term extended system to represent the kind of ongoing, in general time- varying, system that is to be identified. Precisely, an extended system is a triple (a# e , H e, We) where &r e is as just defined, ~ is a linear function space on R 1 that also satisfies
(iii) shift invariance, and
(iv) projection property.
H e is a map 2 from W~ into oy~. The extended system is causal if PaHe(x) --= P~He(Pax) for all x a We and all a. It has bounded memory m if ( I - - Pa) He(x) = ( I - - Pa) H~[(I - - Pa_~)(x)] for all x a We and all a. These can be combined: An extended system is causal with bounded memory m if and only if for every T > 0
(v) (Pt+r - - Pt) He(x) = (Pt+r -- et) He[(Pt+T - - Pt_~)(x)] for all t and all x ~ We (see Root, 1975b).
The idea we follow in modeling a natural system operating for an indefinite time period is first to represent it as an extended system with certain addi- tional restrictions on We, ~/e and H e, and then to regard the extended system as corresponding to a trajectory in some fixed class of systems, as defined in Section 1. We choose to set up the problem so that the topological function
2 In Root (1975b), spaces satisfying the conditions for £r e and ~/~ are topologized; the H e are taken to be continuous maps, and classes of what are here called extended systems are formed. A much more elaborate structure is developed than is necessary for our purposes here.
208 FISKE AND ROOT
spaces that appear (this does not include £r and g¢~) are LZ-spaces. This is not necessary, but it seems sufficiently general and it facilitates the statistical estimation discussed in the next section.
Let T and m be fixed positive numbers. T will be the duration of each observation interval; m will be a bound on the memory duration. The number m is chosen to fit the problem; either the natural systems in question have bounded memory in, or they can be approximated sufficiently well by having the memory truncated to m. T is arbitrary. Let 5¢' = (go, f, 6g, ~) be a class of systems satisfying the conditions
(a) W is a compact subset of L2(--m, T),
(b) ~/is L2(0, T),
(c) 6g is compact.
We deal with the natural representation of Y, ~o = (q/,g, 2/f, 3?). It follows, (c'), that 3¢g is a compact subset of ~(.gr, g/).
Now let (q/a, H% ~,) be an extended system that satisfies the following additional conditions. The input space ~, satisfies
(vi) P~Ke~ CL~(A) for any A,
(vii) P(e_,~.,+r)£re = L_~£ r.
Note that, by virtue of (i), it is sufficient that (vi) hold for some A and that (vii) hold for some t. The output space gca satisfies
(viii) P~a CL2(A) for any A.
Given any t, the map H a induces a map H, from P(*-m,e+r)fa intoL2( t, t q- T) defined by
H~(xe) = P(e,e+r)He(P(e_~,e+r)x), x c ~a ,
x t = P (e_m,e+r)x .
In fact, because of (v) the first equation can be written H~(xe) = P(e,e+r)H~(x). Since by (vii) all x 6 5~ are translates of such xe, He can be regarded as a map from f into @' = L2(0, T). It is required that H ~ be such that
(ix) HesS .
With 5~ o as given it is clear that any extended system that satisfies the additional conditions (v),..., (ix) generates a trajectory {He}, t ~ R 1, in the subset ~ of o~(5~, ~g). We regard an extended system satisfying these conditions as "identifiable" if it is possible to determine Heo for any t o on the basis of input-output data taken over a finite time interval terminating
SLOWLY VARYING SYSTEMS 209
at time t o -1- T. We regard the extended system as "approximately iden- tifiable" if it is possible to determine Hto to within a prescribed tolerance, in a prescribed sense, under the same conditions. As indicated in the Introduction, we are concerned with approximate identifiability.
To make a measurement on the extended system at t will mean to apply an input signal during the time interval (t -- m, t -[- T] and observe the output during the time interval (t, t @ T]. Measurements may be made for overlapping time intervals, but if they are, the successive inputs are necessarily related. To allow a completely arbitrary choice of successive inputs, we suppose that measurements are made at the times t 1 = 0, t~ = m -[- T,..., t~+ 1 = k(m + T) ..... The maps Htk, denoted Hk, describe a discrete trajectory in W. Setting W~ = H~+ 1 -- H~ gives the equations
g~÷l = g~ + W~, (3.1)
y~ = H~(gk), k = 1, 2 .....
to describe the successive measurements, where 2~ is the input signal applied during the interval (t~ -- m, t~ @ T] and y~ is the output observed during the interval (t~, t~ -[- T].
It is to be noted that even though we are defining a system to be an input- output map, there are no serious restrictions on the "state" of the extended system at any time during the measurement process. In particular, since the duration of the input interval is equal to that of the output interval plus a bound on the duration of the system memory, the "output" for each measurement depends only on the "input" for that measurement, and is independent of the state of the system at the beginning of the measurement. This holds true even in the case where measurements are made for over- lapping time intervals.
To get a finite-parameter model for the H~ we employ e-representations. Let ¢ > 0 be fixed at an arbitrary value; then by virtue of conditions (a), (b), (c) on o ~ there exists a standard e-representation (~,¢ i ) as given by Eqs. (2.2) and (2.3), with determining set {x 1 .... , x~}. As before ~ = (~#, f l , 691, ~), and the parameter set 51 C R rim. Also, rr is the orthogonal projection from ~ onto ~ ' , which is an M-dimensional subspace; {e I ,..., eM} is an orthogonal basis for o#,, and the orthogonal basis for R NM is chosen as in Lemma 1.
The fundamental identification problem is to find He, k = 1, 2,...; we are satisfied if we can find a suitably close estimate of h~ = ¢1H~, the corresponding vector parameter for the standard E-representation. Suppose, temporarily, that the extended system is time-invariant so that
210 FISKE AND ROOT
H k=H l fo ra l l k .Let2 k ,k = 1 .... ,N, be chosen to be the N elements of the determining set, thus xk = xl~, k = 1,..., N. Then, from (2.2)
y~' = ,~(y~) = ~o H~(x~)
M M
= Z (~ro H,(x~), %) em= • am~e~, k = 1,..., N . (3.2) qTq,=l m=l
Let h 1 be represented by the column vector
hi = [an ,..., aM1 ; al~ ,..., aM2 ;.-.; a~N ..... aMN]T;
then (3.2) can be written as a matrix equation
y~' = X ,h i , n = 1,..., N , (3.3)
where y~' is now interpreted as a column vector of length M, and X~ is an M × MN matrix which in partitioned form is X n = [0" "'" i 0 i I i 0 i "'" " 0], each block being M × M and the identity matrix appearing in the nth block. Setting
yl =
i _H ' J
and combining Eqs. (3.3) yields
since
y~ = Xh 1 = h 1 ,
X= =I .
Thus, in this case of noise-free identification of a time-invariant extended system, an identification to within ¢ is obtained by making measurements wkh each input signal in the determining set. The parameter values a~ are actually the coordinates of the projections of the outputs on an appropriate M-dimensional subspace wkh a certain orthonormal basis.
SLOWLY VARYING SYSTEMS 211
Now suppose that the extended system is continually changing so that each measurement is performed on a (slightly) different system. From (3.1) and the argument leading to (3.3) we have
h~+ 1 : h~ Jf-w k,
y~' = Xkh~ , k = 1, 2 .. . . . N ,
(3.5)
as the equations describing the skuation at the kth measurement in terms of the standard e-representation. Since ¢1 is linear, w k = ¢lWe. We want to consider blocks of N measurements, and for the purposes of the next section, successive blocks of N measurements, each with the same successive inputs from the determining set {x 1 ,..., xn}. Equations (3.5) are then meaningful for all k = 1, 2,... if X~ = X j , j = k - - [k IN]N , where [a] denotes the integer part of a. Put
and
h n = hNN, yn
, ] Y(n-1) N+I Y'(~-.1) N+~],
y;N J (3.6)
I --XI(W(~_I)N+ I ~- f]O(n_l)N+ 2 + ''' -j- WnN--1) ]
-X~(w(~-1)N+2.+ " ' " + w~N-1) [, __ X N_ol WnN_ 1 I ]
(n+l) N--1 = )_~ g0k. W n
tc=nN
Then the successive blocks of N equations given by (3.5) can be written
hn+l =. h n @. w n,
y~ = Xh ~ + ~ = h n + ~, n= 1,2,..., (3.7)
since X is the NM × NM identity matrix. With the w,~, and hence the w n and e n, unknown, it is now impossible to find any of the h ~ precisely from observations of the y~; however, if each w ~ is small one would expect //n = y~ to be a pretty good approximation to h ~. We have the following simple result.
212 FISKE AND ROOT
LEMMA 3. In the model (3.7) we let wn satisfy only the condition I] w~ [[ ~ ~1, where [[ w~ Ii is the Euclidean norm of w, in RuM. Then h~= y~ satisfies It//" - - h" H ~ (N - - 1)~/, and no estimate of h" formed from y~ can have guaranteed error less than (N -- 1)~/.
Proof. It is sufficient to consider only y: in (3.7), that is, to consider k = 1,..., N, in (3.5). Given the same h:, the two following sets of values for the w~ yield the same observations y j . (1) w 1 = w 2 -- -- WN_: , 1[ Wl I]-----7, Wl G JV"L(X:) (the orthogonal complement of the null space of X:); (2) @: = @2 - - - - @u-: = - -w l - In fact, in either case y~' = Xkhl , k = 1,..., N, since all the w~ and ~ belong to W(X~) for k > 1. The values of h n for the two different cases differ by an element of norm 2(N -- 1)~/. This proves the second assertion. It is easy to see that the choice of w: ,..., wn_ 1 just given defines a worst case for ~n, and the first assertion follows immediately. |
The following theorem summarizes the discussion of the model (3.7) by stating to what extent identifiability of a time-varying system, meeting certain conditions, can be guaranteed. This is, of course, a statement of the situation when noise-free observations of the output are possible. It is preliminary to a corresponding result in the final section that takes account of noisy observations of output.
THEOREM l. Let (~ , H e, f ~) be an extended system satisfying (i),..., (viii) for some m > O. Suppose there is a class of systems 5P satisfying (a), (b), (c) with the same m and arbitrary T > 0 and such that the extended system satisfies (ix) with respect to 5C Finally let it be required that the changes in the extended system during one measurement interval are always such that Ii H~+I - - Hk ][ ~ 3 for all k.
Fix e > 0 and let N be the minimum number of elements in the determining set of a standard E-representation of $f. Then there exists an estimate Hk of Hk, for any k, formed from a preceding block of N measurements such that
11 H~ - H~ II ~< ~ + (N - - 1) N:?&
Proof. Consider the model (3.7) based on the standard e-representation of ~9 v. From Lemma 1 and the condition that II Hk+: - - H~ [[ ~ 3 for all k,
II ~v~ II ~< I14111 ]1 w~ II ~< N1/~&
Then from Lemma 3 the estimate ~n =y, satisfies [1/~ n -hn[ I (N -- 1) N:/28. Put H ~ = ¢:(/~); then from Lemma 2,
N H~ - - ¢:(h~)l] ~ (N -- 1) N:/23.
SLOWLY VARYING SYSTEMS 213
Since h n = ¢I(H.N), and (..cP1,41) is an E-representation, putting HaN = H '~ yields
I[ H,~v -- Hnx II ~ e • (N - - 1) N1/2~,
from the triangle inequality. However, since a block of measurements may be started anywhere, nN may actually take on any integer value, which proves the result. |
Remarks. (1) Conditions (i), (ii), (vi) on £r , together with the com- pactness of ~ , are compatible and not even very restrictive (see Proposi- tion 2.4 of Root, 1975b). The compactness condition of input spaces may seem confining to one used to dealing with linear system theory, but in a practical situation it can often be argued that potential inputs must indeed belong to a compact set because of very real physical constraints (see com- ments in Root, 1971).
(2) It is evident that even in the case of noise-free observations the identifiability of a time-varying system depends on a trade-off between actual rate of system variation on the one hand and the state of prior knowledge (specification of ~) and size of the class of admissible inputs (specification of W) on the other. This is indicated in Theorem 1 in the dependence of N on e, ~ and £r. It would be highly desirable to estimate the dependence of N on e for simple examples of W and 5, but no attempt is made to do this here. The problem is akin to estimating the e-entropy of a set, but is more complicated because of the sense in which the x~ are required to cover W.
(3) In many situations some simple ad hoc construction can be used to obtain a linear-in-the-parameters model of the form of Eq. (3.7) (but usually not with X = I), and the statistical estimation theory of the next section will apply to such models.
We now suppose that the observations of the output of the extended system are contaminated by additive noise. Then Eqs. (3.1) are replaced by
H~+I = H~ + W~,
zk = Hk(X~) + V~,
where V~ is a random variable 3 taking values in L2(O, T). Assume
(3.9)
That is, we take V~ to be a strongly measurable map from a complete probability space into L2(O, T) (see, e.g., Barucha-Reid, 1972).
214 FISKE AND ROOT
(o 0 E (Ve) exists and is equal to zero.
(fl) g~ has a covariance operator / ' (which will necessarily be self- adjoint and of trace class).
(7) V~ and Vj are uncorrelated for k =/= j ; i.e., the cross-covariance operator is zero.
The development leading to (3.7) can now be repeated with the only change being that in each observation equation there is an additive noise term. Let z~' be defined to be ~r(z~). Then, since ~r is linear, Eqs. (3.5) are replaced by
hk+l = h~ + w~, (3.10) zk' = X~hk + vk ,
where v k = ~V~ is represented as a column vector with respect to the basis {e 1 .... , era}. The random vectors v~ have mean zero, are uncorrelated and have covariance operator ~/~, which will be represented by the covariance matrix R. Using the same convention as before as regards superscripts versus subscripts leads to
h~+l = h~ + w~' (3.11)
where v n is a random vector with mean zero and covariance matr ix /~ = N-block diagonal [R i R : ... i R]. Furthermore, v n and v m are uncorrelated for n ~- m.
Equations (3.11) describe the model we consider for identification of a time-varying system in the presence of output noise. In the noise-free case one can compute ]~n = h n + en as an estimate of hn; in the noisy case one can make a statistical estimate f~n of h n + e ~, which is again regarded as an estimate of h% In both cases there is nonremovable error due to the presence of ¢~. In the noise-free case one block of N measurements suffices; in the noisy case continuing measurements allow for a reduction of statistical error, as usual. In the next section the statistical estimation problem is treated.
4. PARAMETER ESTIMATION IN THE TIME-VARYING SYSTEM MODEL
The estimation problem posed by (3.11) is a "mixed" problem which is somewhat unconventional; the terms w n and en are nonstochastic dis- turbances while the term v ~ is stochastic. Even if the e~ were zero, the
SLOWLY VARYING SYSTEMS 215
usual Kalman recursive linear least-mean-squared (LMS) error estimation theory would not apply because of the nature of w% Since such an estimation problem is at least of some academic interest, 4 we generalize it a little, considering however only the case e n ~- O, before constructing an estimate. This more general version also is of use for some time-varying system models formulated on an ad hoc basis with no reference to standard E-repre- sentations, as mentioned above. When the solution we obtain for the estima- tion problem with E ~ = 0 is applied to (3.11) an adjustment has to be made to account for the presence of e n and the extra error it causes, and this is done in the next section.
The criterion used for the quality of estimates is, as usual, mean-squared error. However, in the model used the mean-squared error depends on the values of the unknown, bounded, but nonstochastic quantities. What we would like to do is minimize the maximum mean-squared error. However, the recursive linear estimate constructed only minimizes an upper bound on the maximum mean-squared error, so a comparison lower bound is obtained to provide a guarantee that the estimation procedure is not far from optimum. No linear estimation procedure can yield an estimate with mean-squared error less than the number given by the lower bound for all possible values of the parameters.
We consider
h-+l = h,~ + w ~, (4.1)
z ~- -Xh ~+v '~, n= 1,2 .....
where again h ~ is the vector parameter to be estimated, but where the as- sumptions are now: h ~ is a p-dimensional vector; z n is an r-dimensional vector, r >/p ; X is arbitrary except that ~V(X) = 0; 5 v ~ is a random vector with first and second moments satisfying
Ev n = O,
Evn(v~) r = R~,~, (4.2)
4 Recursive estimation for models where the noise is unknown-but -bounded has been studied by Schweppe, Schlaepfer, Bertsekas and Rhodes, and others (see Schweppe, 1974, and other references given there). The estimator developed here is different f rom any previously discussed as far as we know.
That part of h that lies in the null space of X is not estimable anyway, so there is no real loss in generality in cutt ing down the parameter space, if necessary, to coincide with dV'z(x).
643/3z/3-z
216 FISKE AND ROOT
with the covariance matrix R strictly positive definite, and w" is bounded componentwise, as will be specified. Let {¢1}, i = 1,..., p, be a set of ortho- normal eigenvectors belonging to (XrR-1X) -1, i.e.,
(XrR-~X) -~¢ i = af ter , i = 1,..., p. (4.3)
Then it is required that ~
[¢~ rw" I = I(w", ¢~)1 ~< ~/~, i = 1,...,p; all n. (4.4)
Note that since JV ' (X ) : 0 and R is strictly definite, (XrR-1X) -~ exists and the eigenvalues {ai =} are real and positive.
A recursive estimator for h ~ is now developed. It is suggested by the nonrecursive modified linear unbiased minimum-variance (modified LUMV) estimator described by Root (1973); there is also a vague resemblance to the standard Kalman recursive estimator. To start, let h 1 be the ordinary LUMV estimate of h 1 using the observation z 1, i.e.,
h 1 = (XrR-xX) -1 XTR- l z I = Cz 1, (4 .5)
where the matrix C is defined implicitly by (4.5). As is well known (Albert, 1972), the mean-squared error in this estimate satisfies
E [[ h ~ - - h 1 [I 2 = EII C( xh l ÷ vl) - - hI II 2
= EII cv~ I[ 2 = Tr[(XrR-1X) -1
= E a* ~" (4.6) i=1
Here and in the rest of this section the norm is the Euclidean norm in R ~. It is convenient to introduce the notation b~ ~ aft, i = 1 .... ,p, so that
9
EII ha - - h ~ Ii 2 = Z b~,. (4.7) i=1
Now consider the second observation
and rewrite this as
z 2 = Xh ~ + v ~ = Xh 1 ÷ Xw 1 ÷ v 2,
z~ - xh l = X[(hl - - h~) + wq + v 2.
6 We occas ional ly use inner -product space notat ion in th is section.
(4.8)
notat ion instead of vector -mat r ix
SLOWLY VARYING SYSTEMS 217
Define 0 2~z 2 -Xh 1 and ~1~ (h ~-h t ) - /w l=h 2 - ]¢ so that (4.8) becomes
0 2 : X~ 1 -]- 7) 2. (4.9)
The LUMV estimate of ~1 is
CO 2 ~ ~1 + Cv 2. (4.10)
Equation (4.10) can be interpreted as describing a linear model for estimation in which the linear transformation is the identity, the "observation" is CO 2 and the "noise" is Cv 2. Since the LUMV estimate of ~1 in (4.10) is just the observation itself, there is no conflict in thinking of (4.10) this way. We now wish to use the a priori information that we have about ~, i.e., the information in (4.4) and (4.7), to modify the LUMV estimate of so as to obtain a new estimate with smaller mean-squared error.
Consider a completely arbitrary linear estimate ~1 of ~x in (4.10) and expand it in terms of the {~bi}. Thus,
~0 ~ = ~ a,,(CO 2, ~j) ¢,, (4.11)
i,j=l
where the {%} are real numbers. The error in this estimate is
~1 ~1 = ~ [(aii - - 1)(~1, ~i)"q- ~ ai.$(~ 1, t~j).q- ~ g/j(C'v 2, ~,)] $ i . i=1 J=l j=l
(~ei) (4.12)
Then, from (4.2) and (4.3), and using the facts that ~ is uncorrelated with v 2 and that CRC r = (XrR-1X) -1, it follows that
EII~I ~[12 E (a~, 1)(~ ~,$4)+ a~j(~,$~ .+_ ~ 2 2 i=1 ~=1 i,~=l
(J~) (4.13) We should like to choose the {ai~} so as to minimize the supremum of E II ~1 _ ~111~ for all h and all w satisfying (4.4). However this minimization problem appears to be extremely complicated, and perhaps not even worth- while in view of what follows, so we content ourselves with minimizing the obvious upper bound to (4.13) given by
[ ]' E11~1--~1[12--.<2~ (a i , - -1)2E[ (~l ,~i ) ]2+2~ E ~ aij(~l,$j) i=l i=1 J=l
+ y~ 2 2 (4.14) aijG ~. . i,J=l
218 mSKE AND ROOT
To minimize the right side of i @ j. Making this assignment
i=1 Now
z[(~ ~, ¢3] 2 = E[ (h ' - - h I +
SO
(4.14), one should obviously set ai~ = 0 for reduces (4.13) to
~o (aii __ 1)2 E[(~I, ¢,)]2 _? ~ a, ,~ ~ 2. (4.15)
i=1
= E[ (Cv l , ~i)2] _ 2(w 1, ~i) E[( C~I, ~i)] -~ E[( g})l, ~i)2]
<~ c,~ 2 + 2~7,,~ + ~ = (bli + Vi)2, (4.16)
1o 2 2 E II ~ - - ~1112 ~ ~ [(a. - - 1) 2 (bl, + W~)2 + a .~ 1. (4.17)
Obviously the bound in (4.16) could be improved to b~i + ~i 2, since the unbiasedness of ]21 causes the cross term to vanish. However, this calculation is used all over again in an inductive proof, and in general the preceding estimate h ~ is biased so the use of the Schwarz inequality is necessary.
The upper bound in (4.17) is minimized by putting
(bli + ~)2 i = 1 ..... p. (4.18) aii = (bli @ ~i)2 _~_ (yi 2 '
We let ~1 denote the estimate of ~1 given by (4.11) with aii as given by (4.18) and a,a = O, i @j . Since h 2 = h 1 @ (1, we take as our estimate o fh 2,
h2 ~ hi + ~1. (4.19)
It is perhaps interesting to note that the definition of h 2 in (4.19) is analogous to a key step in a standard derivation of the Kalman recursive least-mean- squared error estimate; the difference is that there the step is shown to lead to an optimum estimate by a projection argument, whereas here there is no such structure available. With the use of (4.17) a short calculation shows that the error in h 2 satisfies the following inequality,
EI] h~ - - h21] 2 = E II(h ~ + ~1) _ (hi + ~l)ll2
= E [l ~1 - - ~ II 2
i=l i=1
= X b~,, (4.20)
SLOWLY VARYING SYSTEMS 219
where
b~ A (b~ + ~;)~ ~? (bli + Vi) 2 + ~P "
Repetition of the procedure just described leads to the recursive estimate
h~, = hn--1 _1__ ~n--1
~=1 (b(n-1)i + ~i)2 + ai 2
b~ = (b("-l)i + ~)2 ~2 i = 1 ..... p (4.21) (b<._a)~ + n~) ~ + ~C" '
with initial conditions given by
h 1 ~ Czl~
b2i = Gz2, i = 1 .... ,p. (4.22)
THEOREM 2. Given the statistical model defined by Eqs. (4.1) and subject to the conditions of (4.2) and (4.4) and to ~A/'(X) = O. The recursive estimates defined by Eqs. (4.21) and (4.22) satisfy the error bound
23 E [I h" - h" ii z <~ ~ b~. (4.23)
Proof. At the kth stage the model (4.1) gives the observation equation
z k =Xh ~+v ~=Xhk- l+Xw ~- l+v k,
which can be rewritten as 0,~ = X~k-1 + v k,
where O k a= z~ _ Xhl. 1 and
~7~-i ~ (h~-i __ hk-i) _}_ wk-i = h k __ hk-i.
From these relations Eqs. (4.21) follow for n = k exactly as they did for n = 2 and the calculation of the accompanying error bound (4.23) also goes the same way. The assertion follows by induction. |
The behavior of the error bound in (4.23) as n --+ oo is of concern. Before presenting a result for this limiting case we state without proof a simple preparatory lemma (see Fiske, 1975b).
220
LEMMA 4. and satisfies
(i) lim~_~0+f(x ) > 0,
(ii) lim~_~ f (x) = constant < ~,
(iii) i f (x) > O, x > O,
(iv) f"(x) < O, x > o, then~
(a) (b)
FISKE AND ROOT
I f f is a real-valued C ~ function defined on the positive half-line
f (x ) = x has a unique solution x* for x > O,
the sequence xn = f(x~_z), xl > O, converges to x*.
THEOREM 3. The equation
(xl/2 + ~i)2 °l~ = x, x > 0, (4.24) (xll ~ + m) 2 + c~ ~
with ~ll, ai > O, has a unique solution xi*. For the recursive estimates of Theorem 2, l im,~ b~i = xi*.
Proof. Follows from Lemma 4 and (4.21). I
A graph of xi* is shown for certain values of ~7i and a i in Fig. J.
Upper Bound (Recurslve)
/ / /
%- I / /Lower Bound 0.2I / / {X=I,R Diogonal)
oov 0 0.1 0.2 0.3 0.4 0.5
~/i/°'i
FIc. 1. Upper and lower bounds on MSE (ith component).
SLOWLY VARYING SYSTEMS 221
EXAMPLE. For a system with input-output relation charaeterizable by a linear integral operator, or more generally by a polynomial integral operator (Volterra polynomial), there is an easy and obvious way to get a linear- in-the-parameters model of the form of (4.1) to which Theorem 2 applies. We indicate how this goes for a second-degree polynomial, and then consider details in a simple special case. It will be clear how this procedure generalizes to higher degree polynomials. Two comments should be made, even though they are obvious. First, some approximation is necessary to reduce the model to a finite-dimensional one; second, this is an example of an ad hoc system model and has nothing to do directly with e-representations.
Suppose the system is described by
~r+T er (r+T) y(t) = ,-m hl(t , s) x(s) ds + "~t[['r-m) h2(t' sl' s~,) x(sl) x(s2) ds 1 ds2,
r~t~r+T, r~R 1, (4.25)
z(t) = y(t) + v(t), (4.26)
where the kernels h 1 and h~ are square-integrable in all their arguments over any bounded set; where x is square integrable on any finite interval; where rn ~ 0, T > 0 are arbitrary but fixed, and where v(t) is a wide-sense stationary stochastic process with mean zero. The pair (hi, h2) represent the system, x is the input, v is noise, and z is the observed output. T is the duration of an observation interval and m is the maximum duration of the finite memory.
Let r be temporarily set equal to zero; let {¢i} and {~i} be complete orthonormal systems (c.o.n.s.) for L~(--m, T) and L2(0, T), respectively. Expansion of both sides of (4.25) with respect to these c.o.n.s, yields
cz = ~ b~,a i + ~ b~ilqa,fli2, k = 1, 2 ..... (4.27) i il,i ~
where
. . = (x, ¢ . ) ,
T T
T T T b/cil'/] = fO ~m ~m h2(t' Sl, $2)¢i1(31)¢~:2($2)~k(t) dsl ds~. dt.
222 FISKE AND ROOT
Since it may be assumed without loss of generality that h 2 is symmetric in s 1 and s2, it may be assumed that b~hi, = bki~q. Under rather obvious compactness conditions on the class of systems and class of inputs in question (e.g., if h 1 , h~, and x are required to belong to compact subsets of the appropriate L l spaces) Eqs. (4.27) can be approximated uniformly in an L ~ sense by a finite system of equations in finitely many unknowns:
M M ck = E bk~a' q- E bk,,,,ailai=, k = I .... , K. (4.28)
i=1 i1,~2=1
These equations can be written in the form of the linear vector-matrix equation y = Xh, where h is the vector of the b's, y is the vector of the c's, and X is determined by the a's.
Equations (4.28) correspond to one measurement, as the term is defined above, at ~- = t 1 = 0. Successive measurements can be made independently at t~ ~ m q-T , t a = 2(m q-T) , etc. With the c.o.n.s, translated appro- priately, each measurement will correspond to a system of equations (4.28) with the input for that measurement determining the a's and the condition of the system during that measurement determining the b's. I f the b's do not change for a sufficient number of measurements, input signals can always be chosen so that the b's can be uniquely determined (see Fiske, 1975b).
Now we suppose that the system in question is causal, and we temporarily also assume it to be time-invariant. The ¢~ and ~/~ can then be chosen to take advantage of these conditions. In particular, define ¢1 ,..., eM by
( m ~1[2 (i - - 1)(T "4- m) i (T + m) ¢~(t) = \T - - -~] ' - -m+ M ~<t<- -m+ M '
= 0, for all other t ~ [--m, T].
The completion of the system {¢i} is irrelevant. There is no harm done if m is increased if necessary so that it is exactly divisible by 8 = (T + m)/M. The ~7i can now be defined to be functions consisting of a simple step of length 8, just as are the ¢i. However, because of the causality, bounded memory and time-invariance there is nothing lost by stopping the observation interval at 8 and defining only ~71 as a step of height (1/8)1/~ on [0, 8].
Thus, T=3, K= 1, M=(m/8)+l , and Eqs. (4.28) reduce to the single equation
M M
C --~ 2 biai ~- 2 bijaiaj, (4.29) i=1 i,3=1
SLOWLY VARYING SYSTEMS 223
where
c -- ~1/~ y(t) dt,
b4 = ~ -~+(~-1)~ fo h(t, s) dt ds,
1 f-m+j f~h( t , s l ,s2) dtds lds~ " b4j = ~ --fa+(4--1)~ --'m+(J--1)~ JO
It will be observed that, since b~j = b~i, there are N = ½M(M + 3) un- known b's in (4.29). We therefore consider a block of N successive measure- ments with N different inputs (i.e., N different sets of the ai) chosen so that the b's are uniquely determined. Using a superscript to denote the number of the measurement, we then have from (4.29)
M M c '~=~b,a#+ ~ b4,a4~a/~, n=l .... ,N . (4.30)
4=1 z,]=l
Equations (4.30) can be rewritten in vector-matrix form,
y =- Xh, (4.31)
where
y ~--~ [c 1, C 2 ..... cN] T,
h = [b 1 , b n ; bz, b2~ ;...; bM, bMM ; 2bn, 2bla .... ,2h im ; 2b2a ,...; 2bM_I,M] T,
and a 1 a 1 ] a l 1, (al l) 2, a21, (a21) 2,-.-, (a2) 2, alia21, ..., M-1 M
X ~ ... o , ' " , aN N ! al N, (alN) 2 .... , alNa2 N M_laM_I ]
From (4.26) and (4.31), there follows the equation
z = Xh + v, (4.32)
where v = Iv 1 ..... v~] r, and v 4 is equal to 3-I/2 times the integral of the noise over the observation interval of the ith measurement. Ev 4 = O, E(v4)2 = constant = a 2, say; and we shall assume E(v4vj) = 0 for i =~j. Thus, E(vv r) = a2I.
I f it is now presumed that h (and hence the b's) changes somewhat from one block of N measurements to the next, and if the same cycle of inputs
224 FISKE AND ROOT
is used for each block of measurements, we recover Eqs. (4.1), where the superscript n denotes the number of the block of measurements. Theorem 2 applies with R = a2I and
C = (XrR-1X) -1 XTR -1 = X-L
As an illustration, consider the simplest nontrivial case with M = 2. Then N = 5 so that five measurements to a block are necessary. The inputs may be chosen so that
[ i l l 0 0 i l - - 1 0 0
X= 0 1 1 . 0 - -1 1 1 1 1
Then,
C =X -1 =
_2o2 o oO 1 0 2 --2 . 0 2 2
4 0 --4 0
The eigenvalues and corresponding eigenvectors of (XrR-1X) -1 = 0-2(XTX) - I are
0-13 = 3.33~ ~, 0-2 2 = 10.2
0.32 = ½0-2,
0"43 ~--- 10-2 ,
%3 _ 0.150-3,
¢1 = [0.166, 0.166, 0.166, 0.166, --0.944] r,
¢5 = (1/21/~)[-1, 1, 0, 0, 0y , ~z = (1/21/2)[ 0, 0, --1, 1, 0] r,
¢, = ½[1, 1, --1, --1, 0] r,
¢5 = [0.474, 0.474, 0.474, 0.474, 0.331] r.
The ~/i remain to be chosen. If limits on the possible variation of (b 1 , bn, b2, b~2,2b12 ) between blocks are taken to be (0 a , 02 , 08 , 0a, 05), then the ~/i should be
~/i = max [%, %, %, %, o~5]T [¢i],
In any given situation the choice of the ~h will probably be something of a guess. If they are chosen conservatively (too large), then the estimates will not be quite as good as they might be; if they are chosen too small the error bounds will be incorrect. Suppose in this example the ratio ~1i/0- =
SLOWLY VARY ING SYSTEMS 225
0.5. Then ~i/171 = 0.28, and from Fig. 1 one sees that the upper bound on the final mean-squared error in the ¢l-component is approximately 0.5a1~ = 1.7a 2, as contrasted with a12= 3.33a 2, the error given by the LUMV estimator.
Lower Bound on Maximum Estimation Error
As previously mentioned, since the error bound just derived is presumably not the best possible, it is important to establish a lower bound on upper bounds to mean-squared error, for comparison. This is done by a contradic- tion argument using standard results from the Kalman linear least-mean- squared error estimation theory. The result is for linear estimators only.
Suppose that all the conditions for Theorem 2 are satisfied, and in addition each w ~ is a stochastic vector satisfying the conditions
P[( w~, ~) = ~d = 1/2 = P[(w ~, ~) = --~i],
E[(w '~, ~)(w ~, ~j) = ~i2~j8~ , (4.33)
E[(w ~, f ) (v m, g)] = 0,
for i, j = 1,..., p, all m and n, and all f and g ~ R ~ in the last equation. Note that these additional conditions in no way violate the original ones. Then Ew ~ = 0 and w ~ has covariance Q satisfying
(Q~bi, ~b~.) = ~8~j . (4.34)
The equations (4.1) can now be interpreted as representing a controllable, observable linear dynamical system model with uncorrelated state and observation noise. To apply the linear LMS recursive estimation theory to (4.1) it is necessary to assume that h 1 is a stochastic vector with known mean and covariance. It is convenient to take
Eh 1 = Cz 1 .~ h 1,
cov(h 1 - - Eh ~) ~ PI = (XTR-~X)-h (4.35)
The error covariance matrix for the LMS est imate/~ of h ~ given the observa- tions zZ,..., z ~ and initial data as in (4.35) satisfies the recurrence equation
P,~ = (P,_a -~ Q)(I - xr[x(P~_~ ~- Q) x r @ R] -~- X(P~_~ q- Q)},
n = 2, 3 ..... (4.36)
(See, e.g., Astr6m, 1970, with P1 given by (4.35).)
LEM~A 5. The sequence {Tr P,~}, n = 1, 2,..., with P,, given by (4.35) and (4.36) is monotonically nonincreasing.
226 FISKE AND ROOT
Proof. We note first that Tr P2 ~< Tr P1. In fact, given the data {z2; hi, P1}, Tr P2 is the mean-squared error of the LMS estimate h 2. But the LUMV estimate of h ~ has mean-squared error Tr[(XrR-1X)-I] = Tr P1. Since this is another linear estimate based on (some of) the same data it cannot have smaller mean-squared error than the LMS estimate. Hence Tr P~ ~ Tr P1.
Now observe that because of the stationarity of the statistics in the model, the LMS estimate of h n given {z 2 ..... z~; hi, P1} has the same error covariance matrix, P~, as does the LMS estimate ofh n+l given {z 3 ..... z~*+l; it~ = Cz2, P1}. Thus, given {z2,..., zn+1; h I, P1}, one can form a (nonoptimum) linear estimate of h ~+1 by first computing the LUMV estimate ]z 2 = Cz 2, which has error covariance matrix P~, and then replacing /~2 in the recursive LMS algorithm by h 2. This is equivalent to finding the LMS estimate of h n+l given {zz,..., zn+l; ]~2, P1}. By the remark just made, the mean-squared error of this estimate is Tr P~, which must then be greater than or equal to the minimum mean-squared error Tr P~+z. |
Let e ~ lim~oo Tr P~+I, then e is a lower bound on uniform upper bounds on mean-squared error. More precisely, we have:
THEOREM 4. Let the conditions of Theorem 2 be satisfied. Then for any linear estimator of the system parameters h n and for any m ~ 1 there exists an admissible sequence {hn}, n = l, 2,..., m such that the mean-squared error in the estimation of the h ~ is greater than or equal to e.
Proof. We are free to suppose that h 1 is a stochastic vector with covariance Pz and that the w ~ are stochastic vectors satisfying (4.33) and bounded as required. Denote an arbitrary linear estimate of h ~ by /~. Suppose that for some m ~ 1
E{II ~ - - h~ I12 I h 1, wI,.-., w ~-l} < e,
for every h 1 and every sequence {w 1 ..... w m-l} meeting the conditions just imposed. Then
g I1 h~ - h ~ II 2 = E[E{Jr h~ - - h ~ ]]2 ] h 1, wl , . . . , w~- l} ] .
But by Lemma 5 the LMS estimates of the h ~ for the given conditions all have mean-squared error Tr(P,,) /> e. Thus, by the minimum mean-squared error property of the LMS estimates there is a contradiction. |
For the very special case that X is the identity this lower upper bound can easily be computed. We compute it here because it gives an immediate
SLOWLY VARYING SYSTEMS 227
comparison with the upper bound of Theorem 2, and also because, simple as it is, it is the case of interest in the general theorem of the last section.
With X = I the ¢ i , i = 1 .... ,p, are orthonormal eigenvectors of R. Thus, if they are chosen as a basis for a vector-matrix formulation, R is the matrix diag[a12 ..... %2], and from (4.34) Q is the matrix diag[~?le,..., %2]. Furthermore, (4.36) then becomes, after a small amount of manipulation,
P~ = (P,~_~ + Q)(P~_~ + Q + R)-~R, n = 2, 3 ..... (4.37)
Since/)1 = R is diagonal, each P~ as given by (4.37) is diagonal. In particular, if p,~_ 1 diag[fl~_m, ~ 2 ~ 2 ,..., fi(~-l)~], then P~ diag[fi~l ]3~ 2 = ~] , = ]~(n--1)2 , ~'" ", where
2 2 2 = ~ ~=-~2 , i = 1 .... ,p. (4.38)
P(n-1)i ~i ~- i
It follows from Lemma 4 that the fi~, converge monotonically to the fixed point of the function
(X @ 7]i 2) O'i 2 X ~ 0, f (x ) = x -+- 7Jig -~ (Yl 2 '
which is
Thus,
0.2 1/2 --.~vi + 1 + 4 ~qi ° (4.39)
P
e = lim Tr(Pn) = ~ /5i 2. (4.40) n~oo
See Fig. 1 for a comparison with the upper error bound for the recursive estimator.
5. IDENTIFICATION OF A SLOWLY VARYING SYSTEM
WITH NOISY OBSERVATIONS OF OUTPUT
As previously indicated, there is really no way to distinguish between h ~ and e n in (3.11) in estimates based on z n. So, in analogy to what was done in the noise-free case, we estimate h ~ + e ~ and simply allow for the error possibly introduced by E n when the estimate obtained is used as an estimate of h n. From (3.7) yn = h,~ + En, so (3.11) can be rewritten in the form
y~+l = y~ + V~, (5.1) z n = yn + v s.
228 FISKE AND ROOT
Since [1 r ~ [] = I](Y ~+1 -- hn+l) + ( hn+l - - hn) + ( h~ -- Yn)l] it follows from Lemma 3 that if lI wn ][ < ~/then
II ~" I1 ~< 2(N -- 1)~/+ N~ = d, (5.2)
where d is defined-by (5.2). We define an estimate for ]z ~ of h n in (3.11) to be the estimate ~" of y~ in (5.1) specified as in Theorem 2. The following theorem is an extension of Theorem 1 that accounts for the statistical errors in the system parameter estimates when the output observations are noisy.
THEOREM 5. Let (~, H a, Y£~) be an extended system satisfying all the conditions of Theorem 1. Suppose that the output for each measurement is observed in the presence of additive noise V~, as in (3.9), and that the Hilbert- space-valued random variables V~ satisfy the conditions (a), (fi), (7)following (3.9). Fix ~ > 0 arbitrarily and let N be as in Theorem 1. Then, given any ~' > O, there exists an estimate I:I~ of H~, for any k, formed from a finite sequence of blocks of N measurements such that
tl ~q- - - H . l[ ~< E + (X - - 1) N1/23 + ~, (5.3)
where ~ is a nonnegative random variable satisfying
MN
E~ ~ <~ ~ Bt 2 + d. i=1
The length of the sequence of blocks depends on #. Each constant B i iS the unique nonnegative solution of
(B~/2 + d) ~ ai ~ (B 1/~ -k d) ~ + a~ 2 = Bi , i = 1,..., MN, (5.4)
i
where the ~2, i = 1 .... , MN, are the eigenvectors of ~F~r (each ai 2 is repeated N times), and where
d = 2(N - - 1) Na/23 + N*/2& (5.5)
Proof. We maintain the convention that superscripts refer to blocks of measurements. Put /~"= ~bl(h" ) = ~bl(#" ). From Theorem 1 it follows that if the estimate of 3~ n is perfect, /]r~ = Hn, where H n satisfies (3.8). From Lemma 2 it follows that if ]1 h n - - yn [[ ~ ~, then
II-'q'~ - Hn II ~< [I B '~ - H '~ It + II Er- - - H,~ II
~< ~ + E + (N - - 1) N1/~,
SLOWLY VARYING SYSTEMS 229
since $1(h n -- yn) =/4~ - - H% This much is true no matter what estimate is used for hn; however, consider h n as defined above. From Theorem 2
~9 E I1 h- - h~ lj ~ ~< Z b~,,
where now p ~ MN, and where the b~i are defined recursively by (4.21) and (4.22). Since X -~/ , the ai ~ in these equations are now simply the eigenvalues of the covariance operator R = ~r_P~r. Since (5.2) is to hold, each ~?i, i = 1 .... ,p , is replaced by d, which, in terms of S is as in (5.5). The assertion then follows from Theorem 3 for H n, which corresponds to the nNth measurement. The conclusion also follows for any H,~ ; one can simply delay the beginning of the first block of measurements by the appropriate amount so that n = kN for some k. |
RECEIVED: May 30, 1975; REVISED: January 19, 1976
REFERENCES
ALBERT, A. (1972), "Regression and the Moore-Penrose Pseudoinverse," pp. 86-93, Academic Press, New York.
ASTR~SM, K. J. (1970), "Introduction to Stochastic Control Theory," pp. 225-237, Academic Press, New York.
BARVCHA-REIo, A. T. (1972), "Random Integral Equations," pp. 14-16, Academic Press, New York.
DIEUDONNI}, J. A. (1960), "Foundations of Modern Analysis," pp. 126-129, Academic Press, New York.
FISKE, P. H., ANn ROOT, W. L. (1974), Estimation of parameters in systems of relatively slow time variation, in "Proceedings of the Twelfth Annual Allerton Conference on Circuit and System Theory," pp. 1039-2047, Univ. of Illinois Press, Urbana, Illinois.
FISKE, P. H. (1975a), Modeling and identification of time-varying systems from noisy observations, in "Proceedings of the Johns-Hopkins Conference on Information Science and Systems," The Johns Hopkins University, Baltimore.
FISKE, P. H. (1975b), "Modeling and Identification of Relatively Slowly Varying Systems," Ph.D. Thesis, The University of Michigan.
ROOT, W. L. (1971), Some general structure theory of systems to be used in identifica- tion and measurement, in "Proceedings of the Fifth Annual Princeton Conference on Information Sciences and Systems, pp. 23-19, Princeton University, Princeton, N. J.
RooT, W. L. (1973), On the modelling and estimation of communication channels, in "Multivariate Analysis, III" (Krishnaiah, Ed.), pp. 61-78, Academic Press, New York.
230 F~SKS AND ROOT
ROOT, W. L. (1975a), On the modeling of systems for identification, I: C-Representa- tions of classes of systems, S IAM J. Control 13, 927-944.
ROOT, W. L. (1975b), On the modeling of systems for identification, II: Time-varying systems, S IAM J. Control 13, 945-974.
SCHWEPPE, F. C. (1973), "Uncertain Dynamic Systems," pp. 154-160, Prentice-Hall, Englewood Cliffs, N. J.
Identifiability of Slowly Varying Systems*
PHILIP H. FISKE
The Analytic Sciences Corporation, Reading, Massachusetts 01867
AND
WILLIAM L. ROOT
Aerospace Engineering Department, The University of Michigan, Ann Arbor, Michigan 48109
A system is conceived of as being slowly varying if it changes slowly enough to permit identification to within a specified error. A generic model is developed to study the identifiability and identification of slowly varying systems. The model is suitable for a large variety of nonlinear, time-varying, causal, bounded memory systems; it has finitely many parameters and is linear in its parameters. Results are obtained with the use of this general model that give guaranteed accuracy of identification as a function of the prior knowledge of the unknown system, the maximum rate of time variation of the system, and the characteristics of output observation noise. To derive these results, a recursive estimation procedure is developed for time-discrete linear dynamical system structures in which the observation noise is statistical but the dynamic equation noise is nonstatistical and is known only to be bounded.
1. INTRODUCTION
Our purpose in this paper is to investigate the identifiability of systems that are changing slowly with time in an unpredictable way. Except in trivial situations, if a system is not time-invariant, a finite-time record of input and output data cannot by itself yield a precise identification of the system, even as it existed during the period the data were taken. The response to the one particular input or sequence of inputs used will be known, but what the response would have been to other inputs cannot be found in general. Even less can be known about the future behavior of the system, unless there is auxiliary information. However, it is reasonable and it is
* Research supported by Air Force Office of Scientific Research, Air Force Systems Command, USAF, under AFOSR Grant Number 72-2328.
201 Copyright © 1976 by Academic Press, Inc. All rights of reproduction in any form reserved.
202 FISKE AND ROOT
common practice to try to identify the unknown systems that are slowly time-varying as if they were time-invariant, updating the identification from time to time in order to track the changes. Such an approach is valid if the system is changing slowly enough to permit sufficiently good approxi- mate identification for the purpose at hand. The problem we address is: How fast can the system change and still permit satisfactory identification from input-output data ? The answer to this question depends of course on the criterion of satisfactory identifiability. It also depends on the extent of prior knowledge of the system (e.g., is the system known to be linear ?) and on the characteristics and state of prior knowledge of observation noise.
Identifiability can be studied in terms of any system model that is capable of providing a sufficiently faithful representation of the unknown system in question. We choose a particular kind of generic model (i.e., a model with various undetermined parameters) that has the properties of being extremely flexible and of being linear in the parameters. The flexibility of the model makes it applicable to a wide class of systems, nonlinear as well as linear. The linearity in the parameters means that the identification is always a linear estimation problem, which is advantageous when general error formulas are desired that will show the interplay among the factors that control the identifiability. The essential restrictions on generality imposed by the model structure we use are: (1) Input observations must be noise-free, and (2) the unknown system must have finite memory less than or equal to some known bound. Since the theory is an approximation theory, it is not critical in the final interpretation that these conditions be exactly satisfied; however, the mathematical analysis is based on these assumptions.
The generic model is developed in the next two sections. It is based on the notions of classes of systems and their e-representations. In Section 3 a theorem is stated about the identifiability of time-varying systems with noise-free observations of output. This is a relatively simple result which is preparatory to the main identifiability theorem, Theorem 5 of Section 5.
A recursive statistical estimation procedure is developed in Section 4, which is also preparatory to Theorem 5. This procedure applies to a mixed linear model with both stochastic and unknown-but-bounded terms, and is new as far as we are aware. Since the estimation problem is perhaps of some interest in itself, we have treated it a little more generally than is necessary for the application in Section 5.
Some of the results of this paper were presented in the conference papers (Fiske and Root, 1974; Fiske, 1975a); see also Fiske, (1975b), where there is further related material.
SLOWLY VARYING SYSTEMS 203
2. PRELIMINARIES CONCERNING e-REPRESENTATIONS
The concept of e-representation is introduced and developed somewhat by Root (1971, 1975a); the use of e-representations in identification is discussed abstractly by Root (1973). For the convenience of the reader a summary of those definitions and results concerning e-representations which are needed is provided here.
A class of bounded systems is the 4-tuple 50 = (~/, f, O, ~), where ~ is a Banach space, O and 2~ are metric spaces, and f is a continuous map from the topological product O × ~ into Yg, bounded on 5~. W, 0, and 5F are called the output space, system parameter space, and input space, respectively. If ~ E 0, s = (Y/, f, ~, 5~) is a system belonging to 5 °. The equation y = f(a, x), x ~X, describes the transformation of inputs to outputs for the system s.
A class of systems 50 is prelinear if 6g is a subset of a linear space and
f(ca~l -k c2~2, x) : clf(~Xl , X) @ c2f(~x2, X)
for all x c W whenever the scalars c 1 , c 2 and the parameters ~1, c~2 are such that all three terms are defined. Note that the systems in a prelinear class need not have any linearity properties.
Let ~" = ~-(W, ~J) denote the set of all bounded continuous maps from into ~J made into a Banach space in the standard way (see, e.g., Dieudonn6,
1960); i.e., if F E Y , [] F ]l is defined by [] F ]l = sups. [I F(x)ll, where the norm on the right is the norm in ~J. Define g: ~ × W --+ ~ by g(F, x) = F(x). Then g is continuous on ~ × 5~ and actually linear in ~-. Thus 50~ = (~, g, ~ , W) is a linear class of bounded systems.
Note that f(~, ") defines an element, call it HE, of J ( f , ~). Let ¢ be the map from O into ~(2~, Y/) defined by ¢(~) = HE, and put ~f' =a ¢(0). Then 500 = (~/, g, g/t°, YY) is a class of systems, and is a subclass of 50o~. 500 is called the natural representation of 50 = (~, f, 0, ~) and ~b the natural mapping for the class 50. 500 is always a prelinear class, whether 50 is or not. I f 50 is itself a prelinear class then it follows that ¢ is a restriction of a linear map from the linear span of O into ~. I f ~b is injective and O and ~ are compact metric spaces, then 50 is equivalent to its natural representation 500 (see Root, 1975a, for definition and details) and nothing is lost by con- sidering 500 directly.
Let ~ = (Y/, f l , O1,5~) be another class of bounded systems with natural mapping ~b 1 . I f there exists a map ~1 from 9f' into O 1 such that [l H - - ~b 1 o ¢I(H)I[ ~< e for all H ~ ~t', (501, ¢1) is said to be an e-representation of ~0 (and also to be an e-representation of any class for which ~0 is the
204 FISKE AND ROOT
natural representation). The definition does not require that ~b 1 o¢1(H ) necessarily belong to W. It does require that 50 and 5:1 have the same input and output spaces. An e-representation (5:1, ¢1) of -5: is linear if 5:1 is a prelinear class and ¢1 is a restriction of a linear mapping; it is finite- dimensional if 51 is a subset of a finite-dimensional Euclidean space; it is continuous if both ¢1 and ¢1 are continuous; it is determined by ~o, Y'o C ~, if the map ¢1 depends only on the functions H, H E W, restricted to ~0 • The basic existence theorem is the following.
THEOREM (Root (1975a)). Let ~/ be a Banach space and 5 and ~ be compact metric spaces. Let 5: = (~, f, 5 , f ) be a class of bounded systems. Given e > O, there exists a continuous, finite-dimensional e-representation (5:1, ¢1) of 5: that is determined by a finite subset fo C Y:. 5:1 is a prelinear class. I f ~t is a Hilbert space, then in addition (~1 can always be a linear map so that (5:1, ¢1) is a linear e-representation. 1
The proof of this theorem is by construction and the formulas given by the construction are needed here; so some further discussion is necessary. It is shown that one can find a finite-dimensional subspace ~ ' of ~' that has a subset arbitrarily close to the total image set Y{( f ) C Y/. A continuous map ~r from ~ into ~/' is devised that carries points in d/f (W) into sufficiently nearby points of ~t,. In case ~' is a Hilbert space, ~r can be taken to be the orthogonal projection on ~ ' , and this is the only case we consider. Letting y ' = ~ry, y ' is expressed in terms of an arbitrary basis {e I .... , eu} of ~/'
M y '= ~, e,*(y ' )e , , (2.1)
i=l
where the ei* are continuous linear functionals. Now if the determining set f0 = {xl .... , xx} , the parameter space 51 of the c-representation is a subset of RuM. The map ¢1: ~- -+ 51 is defined as follows. ¢1(H) = h, where h is that point in R gu with coordinates, with respect to an arbitrary ortho- normal basis, given by
am~ = e~* o 7r o H(x,), m = 1 .... , M; n = 1 ..... N. (2.2)
A set of continuous interpolation functionals {yl(x),..., ylv(X)}, x e f , is constructed. These functionals depend on the set {x 1 ,..., X2v}, and have the properties
1 It can happen that ¢1 is l inear when ~d is only a Banach space, but it is not known to us whether this can be guaranteed.
SLOWLY VARY ING SYSTEMS 205
(1) 7.(x) ~0, n= 1 ..... N, xE~, N
(2) Z,,=, y,~(x) -= 1, x e W,
(3) 7,,~(x~) = 8,~, n, k = 1,..., N .
Finally, the map f , is defined by
f , (h, x) = 7,~(x) am %. (2.3) m=l n=l
An e-representation of the form just described is called a standard e-repre- sentation. It depends on the set {xl ,..., xN} , and that set is called its determining set.
The integers M and N depend on f , dr' (or 0/) and e, but are not uniquely defined simply by the condition that (~1, ¢1) is a standard C-representation. However, given f , 3/d, and e there is a smallest N such that there exists a standard e-representation with a determining set with N elements (we are not concerned with the size of M). With fixed ~0 and W this minimum N is nondecreasing as e is made smaller; with fixed ¢ the minimal N is nondecreasing as either f or 35 ° is made larger in the sense of set inclusion.
Two simple preparatory lemmas are needed that are not given in the references.
LEMMA 1. For a standard e-representation with ~ a Hilbert space and {era}, m -~ 1 .... , M, an orthonormal set, ~ is a restriction of a linear map from o~(97, ~) -+ R NM with linear operator norm ~N1/2.
Proof. Let R, M be M-dimensional Euclidean space with basis {e, , . . . , eM} , and let the image of 3(¢ ~ given x~ under the mapping defined by (2.2) lie in R~ M. Let R MN = R1 M @ "'" (~) R~ M, and assign the basis {emn}, m = 1,..., M; n = 1,..., N, where em~ is the element of R MN corresponding to e~ in R, M. Then with respect to this basis, let h = ~,(H), h' ---- q~,(H') have coordinates {a~}, {a~} in R NM, respectively. From (2.2),
(a,~ - - a~,~) % II H(x.) - H'(x.)ll ~
II H -- H ' II ~, n = 1,..., N .
Hence,
llh - - h'l] ~ ~ NIl H - - H'[[~
from which the assertion follows. |
206 FISKE AND ROOT
LEMMA 2. With the same hypotheses as for Lemma 1, ¢~ is a restriction of a linear map from R NM -+ ~-(£r ~) with linear operator norm bounded by one.
Proof. Because of the proliferation of norms in different spaces which appear, subscripts are used with the norms initially, except for the linear operator norm of ~b 1 which is denoted 1 ¢1 ]. Then,
I ¢1 I = sup II ¢1(h)1l R~ 11 h II~NM
= sup sup IlA(h, x)lle~ RNM .~
= sup s 7.(x) a~. R NM u.vP ~a=l 1
,} em ] R M
~< sup 1 sup [ a,~,~ [ R NM "~" \~'t=l
~< sup 1 sup (x) • max [am~ RNM X 1 n
l = sup I R NM N
because ~,~=1 y~(x) = 1 for all x ~ :T. Then
E~=l max. l a~. I¢11 ~< sup EM ~.N a ~ " ~<1. |
3. A MODEL FOR TIME-VARYING SYSTEMS
Some further notations, conventions, and definitions are useful. In what follows we deal with spaces of functions (or equivalence classes of functions) on R 1 or subsets of R1; in each case the functions take values in a finite- dimensional Euclidean space. L2(a, b) denotes the L2-space formed from functions defined on (a, b) and square integrable Lebesgue. We shall identify the spaces L2(a, b) and L~(t + a, t + b), t any real number, without further comment. The symbol A will sometimes be used as a generic symbol for
SLOWLY VARYING SYSTEMS 207
an interval in R 1 (of the form (a, b], if the endpoints matter). The linear operators L~, Pa and P~ are defined on functions x on R 1 by
(Lax)(t) ~- x(t + a),
(Pox)(t) --- x(t), t ~< a,
-= 0, t ~ a,
(P~x)(t) -~ x(t), t ~ A,
=0, t~A.
The symbol W e denotes a set of functions x on R 1 that satisfies the conditions
(i) (shift invariance) x ~ fe ~ Lax ~ We for all a ~ R1;
(ii) (projection property) x ~ W e ~ Pax ~ We , ( I - - Pa)x ~ f ~ and Pax ~ fe for all a and A.
To avoid confusion between different uses of the word system, we introduce the term extended system to represent the kind of ongoing, in general time- varying, system that is to be identified. Precisely, an extended system is a triple (a# e , H e, We) where &r e is as just defined, ~ is a linear function space on R 1 that also satisfies
(iii) shift invariance, and
(iv) projection property.
H e is a map 2 from W~ into oy~. The extended system is causal if PaHe(x) --= P~He(Pax) for all x a We and all a. It has bounded memory m if ( I - - Pa) He(x) = ( I - - Pa) H~[(I - - Pa_~)(x)] for all x a We and all a. These can be combined: An extended system is causal with bounded memory m if and only if for every T > 0
(v) (Pt+r - - Pt) He(x) = (Pt+r -- et) He[(Pt+T - - Pt_~)(x)] for all t and all x ~ We (see Root, 1975b).
The idea we follow in modeling a natural system operating for an indefinite time period is first to represent it as an extended system with certain addi- tional restrictions on We, ~/e and H e, and then to regard the extended system as corresponding to a trajectory in some fixed class of systems, as defined in Section 1. We choose to set up the problem so that the topological function
2 In Root (1975b), spaces satisfying the conditions for £r e and ~/~ are topologized; the H e are taken to be continuous maps, and classes of what are here called extended systems are formed. A much more elaborate structure is developed than is necessary for our purposes here.
208 FISKE AND ROOT
spaces that appear (this does not include £r and g¢~) are LZ-spaces. This is not necessary, but it seems sufficiently general and it facilitates the statistical estimation discussed in the next section.
Let T and m be fixed positive numbers. T will be the duration of each observation interval; m will be a bound on the memory duration. The number m is chosen to fit the problem; either the natural systems in question have bounded memory in, or they can be approximated sufficiently well by having the memory truncated to m. T is arbitrary. Let 5¢' = (go, f, 6g, ~) be a class of systems satisfying the conditions
(a) W is a compact subset of L2(--m, T),
(b) ~/is L2(0, T),
(c) 6g is compact.
We deal with the natural representation of Y, ~o = (q/,g, 2/f, 3?). It follows, (c'), that 3¢g is a compact subset of ~(.gr, g/).
Now let (q/a, H% ~,) be an extended system that satisfies the following additional conditions. The input space ~, satisfies
(vi) P~Ke~ CL~(A) for any A,
(vii) P(e_,~.,+r)£re = L_~£ r.
Note that, by virtue of (i), it is sufficient that (vi) hold for some A and that (vii) hold for some t. The output space gca satisfies
(viii) P~a CL2(A) for any A.
Given any t, the map H a induces a map H, from P(*-m,e+r)fa intoL2( t, t q- T) defined by
H~(xe) = P(e,e+r)He(P(e_~,e+r)x), x c ~a ,
x t = P (e_m,e+r)x .
In fact, because of (v) the first equation can be written H~(xe) = P(e,e+r)H~(x). Since by (vii) all x 6 5~ are translates of such xe, He can be regarded as a map from f into @' = L2(0, T). It is required that H ~ be such that
(ix) HesS .
With 5~ o as given it is clear that any extended system that satisfies the additional conditions (v),..., (ix) generates a trajectory {He}, t ~ R 1, in the subset ~ of o~(5~, ~g). We regard an extended system satisfying these conditions as "identifiable" if it is possible to determine Heo for any t o on the basis of input-output data taken over a finite time interval terminating
SLOWLY VARYING SYSTEMS 209
at time t o -1- T. We regard the extended system as "approximately iden- tifiable" if it is possible to determine Hto to within a prescribed tolerance, in a prescribed sense, under the same conditions. As indicated in the Introduction, we are concerned with approximate identifiability.
To make a measurement on the extended system at t will mean to apply an input signal during the time interval (t -- m, t -[- T] and observe the output during the time interval (t, t @ T]. Measurements may be made for overlapping time intervals, but if they are, the successive inputs are necessarily related. To allow a completely arbitrary choice of successive inputs, we suppose that measurements are made at the times t 1 = 0, t~ = m -[- T,..., t~+ 1 = k(m + T) ..... The maps Htk, denoted Hk, describe a discrete trajectory in W. Setting W~ = H~+ 1 -- H~ gives the equations
g~÷l = g~ + W~, (3.1)
y~ = H~(gk), k = 1, 2 .....
to describe the successive measurements, where 2~ is the input signal applied during the interval (t~ -- m, t~ @ T] and y~ is the output observed during the interval (t~, t~ -[- T].
It is to be noted that even though we are defining a system to be an input- output map, there are no serious restrictions on the "state" of the extended system at any time during the measurement process. In particular, since the duration of the input interval is equal to that of the output interval plus a bound on the duration of the system memory, the "output" for each measurement depends only on the "input" for that measurement, and is independent of the state of the system at the beginning of the measurement. This holds true even in the case where measurements are made for over- lapping time intervals.
To get a finite-parameter model for the H~ we employ e-representations. Let ¢ > 0 be fixed at an arbitrary value; then by virtue of conditions (a), (b), (c) on o ~ there exists a standard e-representation (~,¢ i ) as given by Eqs. (2.2) and (2.3), with determining set {x 1 .... , x~}. As before ~ = (~#, f l , 691, ~), and the parameter set 51 C R rim. Also, rr is the orthogonal projection from ~ onto ~ ' , which is an M-dimensional subspace; {e I ,..., eM} is an orthogonal basis for o#,, and the orthogonal basis for R NM is chosen as in Lemma 1.
The fundamental identification problem is to find He, k = 1, 2,...; we are satisfied if we can find a suitably close estimate of h~ = ¢1H~, the corresponding vector parameter for the standard E-representation. Suppose, temporarily, that the extended system is time-invariant so that
210 FISKE AND ROOT
H k=H l fo ra l l k .Let2 k ,k = 1 .... ,N, be chosen to be the N elements of the determining set, thus xk = xl~, k = 1,..., N. Then, from (2.2)
y~' = ,~(y~) = ~o H~(x~)
M M
= Z (~ro H,(x~), %) em= • am~e~, k = 1,..., N . (3.2) qTq,=l m=l
Let h 1 be represented by the column vector
hi = [an ,..., aM1 ; al~ ,..., aM2 ;.-.; a~N ..... aMN]T;
then (3.2) can be written as a matrix equation
y~' = X ,h i , n = 1,..., N , (3.3)
where y~' is now interpreted as a column vector of length M, and X~ is an M × MN matrix which in partitioned form is X n = [0" "'" i 0 i I i 0 i "'" " 0], each block being M × M and the identity matrix appearing in the nth block. Setting
yl =
i _H ' J
and combining Eqs. (3.3) yields
since
y~ = Xh 1 = h 1 ,
X= =I .
Thus, in this case of noise-free identification of a time-invariant extended system, an identification to within ¢ is obtained by making measurements wkh each input signal in the determining set. The parameter values a~ are actually the coordinates of the projections of the outputs on an appropriate M-dimensional subspace wkh a certain orthonormal basis.
SLOWLY VARYING SYSTEMS 211
Now suppose that the extended system is continually changing so that each measurement is performed on a (slightly) different system. From (3.1) and the argument leading to (3.3) we have
h~+ 1 : h~ Jf-w k,
y~' = Xkh~ , k = 1, 2 .. . . . N ,
(3.5)
as the equations describing the skuation at the kth measurement in terms of the standard e-representation. Since ¢1 is linear, w k = ¢lWe. We want to consider blocks of N measurements, and for the purposes of the next section, successive blocks of N measurements, each with the same successive inputs from the determining set {x 1 ,..., xn}. Equations (3.5) are then meaningful for all k = 1, 2,... if X~ = X j , j = k - - [k IN]N , where [a] denotes the integer part of a. Put
and
h n = hNN, yn
, ] Y(n-1) N+I Y'(~-.1) N+~],
y;N J (3.6)
I --XI(W(~_I)N+ I ~- f]O(n_l)N+ 2 + ''' -j- WnN--1) ]
-X~(w(~-1)N+2.+ " ' " + w~N-1) [, __ X N_ol WnN_ 1 I ]
(n+l) N--1 = )_~ g0k. W n
tc=nN
Then the successive blocks of N equations given by (3.5) can be written
hn+l =. h n @. w n,
y~ = Xh ~ + ~ = h n + ~, n= 1,2,..., (3.7)
since X is the NM × NM identity matrix. With the w,~, and hence the w n and e n, unknown, it is now impossible to find any of the h ~ precisely from observations of the y~; however, if each w ~ is small one would expect //n = y~ to be a pretty good approximation to h ~. We have the following simple result.
212 FISKE AND ROOT
LEMMA 3. In the model (3.7) we let wn satisfy only the condition I] w~ [[ ~ ~1, where [[ w~ Ii is the Euclidean norm of w, in RuM. Then h~= y~ satisfies It//" - - h" H ~ (N - - 1)~/, and no estimate of h" formed from y~ can have guaranteed error less than (N -- 1)~/.
Proof. It is sufficient to consider only y: in (3.7), that is, to consider k = 1,..., N, in (3.5). Given the same h:, the two following sets of values for the w~ yield the same observations y j . (1) w 1 = w 2 -- -- WN_: , 1[ Wl I]-----7, Wl G JV"L(X:) (the orthogonal complement of the null space of X:); (2) @: = @2 - - - - @u-: = - -w l - In fact, in either case y~' = Xkhl , k = 1,..., N, since all the w~ and ~ belong to W(X~) for k > 1. The values of h n for the two different cases differ by an element of norm 2(N -- 1)~/. This proves the second assertion. It is easy to see that the choice of w: ,..., wn_ 1 just given defines a worst case for ~n, and the first assertion follows immediately. |
The following theorem summarizes the discussion of the model (3.7) by stating to what extent identifiability of a time-varying system, meeting certain conditions, can be guaranteed. This is, of course, a statement of the situation when noise-free observations of the output are possible. It is preliminary to a corresponding result in the final section that takes account of noisy observations of output.
THEOREM l. Let (~ , H e, f ~) be an extended system satisfying (i),..., (viii) for some m > O. Suppose there is a class of systems 5P satisfying (a), (b), (c) with the same m and arbitrary T > 0 and such that the extended system satisfies (ix) with respect to 5C Finally let it be required that the changes in the extended system during one measurement interval are always such that Ii H~+I - - Hk ][ ~ 3 for all k.
Fix e > 0 and let N be the minimum number of elements in the determining set of a standard E-representation of $f. Then there exists an estimate Hk of Hk, for any k, formed from a preceding block of N measurements such that
11 H~ - H~ II ~< ~ + (N - - 1) N:?&
Proof. Consider the model (3.7) based on the standard e-representation of ~9 v. From Lemma 1 and the condition that II Hk+: - - H~ [[ ~ 3 for all k,
II ~v~ II ~< I14111 ]1 w~ II ~< N1/~&
Then from Lemma 3 the estimate ~n =y, satisfies [1/~ n -hn[ I (N -- 1) N:/28. Put H ~ = ¢:(/~); then from Lemma 2,
N H~ - - ¢:(h~)l] ~ (N -- 1) N:/23.
SLOWLY VARYING SYSTEMS 213
Since h n = ¢I(H.N), and (..cP1,41) is an E-representation, putting HaN = H '~ yields
I[ H,~v -- Hnx II ~ e • (N - - 1) N1/2~,
from the triangle inequality. However, since a block of measurements may be started anywhere, nN may actually take on any integer value, which proves the result. |
Remarks. (1) Conditions (i), (ii), (vi) on £r , together with the com- pactness of ~ , are compatible and not even very restrictive (see Proposi- tion 2.4 of Root, 1975b). The compactness condition of input spaces may seem confining to one used to dealing with linear system theory, but in a practical situation it can often be argued that potential inputs must indeed belong to a compact set because of very real physical constraints (see com- ments in Root, 1971).
(2) It is evident that even in the case of noise-free observations the identifiability of a time-varying system depends on a trade-off between actual rate of system variation on the one hand and the state of prior knowledge (specification of ~) and size of the class of admissible inputs (specification of W) on the other. This is indicated in Theorem 1 in the dependence of N on e, ~ and £r. It would be highly desirable to estimate the dependence of N on e for simple examples of W and 5, but no attempt is made to do this here. The problem is akin to estimating the e-entropy of a set, but is more complicated because of the sense in which the x~ are required to cover W.
(3) In many situations some simple ad hoc construction can be used to obtain a linear-in-the-parameters model of the form of Eq. (3.7) (but usually not with X = I), and the statistical estimation theory of the next section will apply to such models.
We now suppose that the observations of the output of the extended system are contaminated by additive noise. Then Eqs. (3.1) are replaced by
H~+I = H~ + W~,
zk = Hk(X~) + V~,
where V~ is a random variable 3 taking values in L2(O, T). Assume
(3.9)
That is, we take V~ to be a strongly measurable map from a complete probability space into L2(O, T) (see, e.g., Barucha-Reid, 1972).
214 FISKE AND ROOT
(o 0 E (Ve) exists and is equal to zero.
(fl) g~ has a covariance operator / ' (which will necessarily be self- adjoint and of trace class).
(7) V~ and Vj are uncorrelated for k =/= j ; i.e., the cross-covariance operator is zero.
The development leading to (3.7) can now be repeated with the only change being that in each observation equation there is an additive noise term. Let z~' be defined to be ~r(z~). Then, since ~r is linear, Eqs. (3.5) are replaced by
hk+l = h~ + w~, (3.10) zk' = X~hk + vk ,
where v k = ~V~ is represented as a column vector with respect to the basis {e 1 .... , era}. The random vectors v~ have mean zero, are uncorrelated and have covariance operator ~/~, which will be represented by the covariance matrix R. Using the same convention as before as regards superscripts versus subscripts leads to
h~+l = h~ + w~' (3.11)
where v n is a random vector with mean zero and covariance matr ix /~ = N-block diagonal [R i R : ... i R]. Furthermore, v n and v m are uncorrelated for n ~- m.
Equations (3.11) describe the model we consider for identification of a time-varying system in the presence of output noise. In the noise-free case one can compute ]~n = h n + en as an estimate of hn; in the noisy case one can make a statistical estimate f~n of h n + e ~, which is again regarded as an estimate of h% In both cases there is nonremovable error due to the presence of ¢~. In the noise-free case one block of N measurements suffices; in the noisy case continuing measurements allow for a reduction of statistical error, as usual. In the next section the statistical estimation problem is treated.
4. PARAMETER ESTIMATION IN THE TIME-VARYING SYSTEM MODEL
The estimation problem posed by (3.11) is a "mixed" problem which is somewhat unconventional; the terms w n and en are nonstochastic dis- turbances while the term v ~ is stochastic. Even if the e~ were zero, the
SLOWLY VARYING SYSTEMS 215
usual Kalman recursive linear least-mean-squared (LMS) error estimation theory would not apply because of the nature of w% Since such an estimation problem is at least of some academic interest, 4 we generalize it a little, considering however only the case e n ~- O, before constructing an estimate. This more general version also is of use for some time-varying system models formulated on an ad hoc basis with no reference to standard E-repre- sentations, as mentioned above. When the solution we obtain for the estima- tion problem with E ~ = 0 is applied to (3.11) an adjustment has to be made to account for the presence of e n and the extra error it causes, and this is done in the next section.
The criterion used for the quality of estimates is, as usual, mean-squared error. However, in the model used the mean-squared error depends on the values of the unknown, bounded, but nonstochastic quantities. What we would like to do is minimize the maximum mean-squared error. However, the recursive linear estimate constructed only minimizes an upper bound on the maximum mean-squared error, so a comparison lower bound is obtained to provide a guarantee that the estimation procedure is not far from optimum. No linear estimation procedure can yield an estimate with mean-squared error less than the number given by the lower bound for all possible values of the parameters.
We consider
h-+l = h,~ + w ~, (4.1)
z ~- -Xh ~+v '~, n= 1,2 .....
where again h ~ is the vector parameter to be estimated, but where the as- sumptions are now: h ~ is a p-dimensional vector; z n is an r-dimensional vector, r >/p ; X is arbitrary except that ~V(X) = 0; 5 v ~ is a random vector with first and second moments satisfying
Ev n = O,
Evn(v~) r = R~,~, (4.2)
4 Recursive estimation for models where the noise is unknown-but -bounded has been studied by Schweppe, Schlaepfer, Bertsekas and Rhodes, and others (see Schweppe, 1974, and other references given there). The estimator developed here is different f rom any previously discussed as far as we know.
That part of h that lies in the null space of X is not estimable anyway, so there is no real loss in generality in cutt ing down the parameter space, if necessary, to coincide with dV'z(x).
643/3z/3-z
216 FISKE AND ROOT
with the covariance matrix R strictly positive definite, and w" is bounded componentwise, as will be specified. Let {¢1}, i = 1,..., p, be a set of ortho- normal eigenvectors belonging to (XrR-1X) -1, i.e.,
(XrR-~X) -~¢ i = af ter , i = 1,..., p. (4.3)
Then it is required that ~
[¢~ rw" I = I(w", ¢~)1 ~< ~/~, i = 1,...,p; all n. (4.4)
Note that since JV ' (X ) : 0 and R is strictly definite, (XrR-1X) -~ exists and the eigenvalues {ai =} are real and positive.
A recursive estimator for h ~ is now developed. It is suggested by the nonrecursive modified linear unbiased minimum-variance (modified LUMV) estimator described by Root (1973); there is also a vague resemblance to the standard Kalman recursive estimator. To start, let h 1 be the ordinary LUMV estimate of h 1 using the observation z 1, i.e.,
h 1 = (XrR-xX) -1 XTR- l z I = Cz 1, (4 .5)
where the matrix C is defined implicitly by (4.5). As is well known (Albert, 1972), the mean-squared error in this estimate satisfies
E [[ h ~ - - h 1 [I 2 = EII C( xh l ÷ vl) - - hI II 2
= EII cv~ I[ 2 = Tr[(XrR-1X) -1
= E a* ~" (4.6) i=1
Here and in the rest of this section the norm is the Euclidean norm in R ~. It is convenient to introduce the notation b~ ~ aft, i = 1 .... ,p, so that
9
EII ha - - h ~ Ii 2 = Z b~,. (4.7) i=1
Now consider the second observation
and rewrite this as
z 2 = Xh ~ + v ~ = Xh 1 ÷ Xw 1 ÷ v 2,
z~ - xh l = X[(hl - - h~) + wq + v 2.
6 We occas ional ly use inner -product space notat ion in th is section.
(4.8)
notat ion instead of vector -mat r ix
SLOWLY VARYING SYSTEMS 217
Define 0 2~z 2 -Xh 1 and ~1~ (h ~-h t ) - /w l=h 2 - ]¢ so that (4.8) becomes
0 2 : X~ 1 -]- 7) 2. (4.9)
The LUMV estimate of ~1 is
CO 2 ~ ~1 + Cv 2. (4.10)
Equation (4.10) can be interpreted as describing a linear model for estimation in which the linear transformation is the identity, the "observation" is CO 2 and the "noise" is Cv 2. Since the LUMV estimate of ~1 in (4.10) is just the observation itself, there is no conflict in thinking of (4.10) this way. We now wish to use the a priori information that we have about ~, i.e., the information in (4.4) and (4.7), to modify the LUMV estimate of so as to obtain a new estimate with smaller mean-squared error.
Consider a completely arbitrary linear estimate ~1 of ~x in (4.10) and expand it in terms of the {~bi}. Thus,
~0 ~ = ~ a,,(CO 2, ~j) ¢,, (4.11)
i,j=l
where the {%} are real numbers. The error in this estimate is
~1 ~1 = ~ [(aii - - 1)(~1, ~i)"q- ~ ai.$(~ 1, t~j).q- ~ g/j(C'v 2, ~,)] $ i . i=1 J=l j=l
(~ei) (4.12)
Then, from (4.2) and (4.3), and using the facts that ~ is uncorrelated with v 2 and that CRC r = (XrR-1X) -1, it follows that
EII~I ~[12 E (a~, 1)(~ ~,$4)+ a~j(~,$~ .+_ ~ 2 2 i=1 ~=1 i,~=l
(J~) (4.13) We should like to choose the {ai~} so as to minimize the supremum of E II ~1 _ ~111~ for all h and all w satisfying (4.4). However this minimization problem appears to be extremely complicated, and perhaps not even worth- while in view of what follows, so we content ourselves with minimizing the obvious upper bound to (4.13) given by
[ ]' E11~1--~1[12--.<2~ (a i , - -1)2E[ (~l ,~i ) ]2+2~ E ~ aij(~l,$j) i=l i=1 J=l
+ y~ 2 2 (4.14) aijG ~. . i,J=l
218 mSKE AND ROOT
To minimize the right side of i @ j. Making this assignment
i=1 Now
z[(~ ~, ¢3] 2 = E[ (h ' - - h I +
SO
(4.14), one should obviously set ai~ = 0 for reduces (4.13) to
~o (aii __ 1)2 E[(~I, ¢,)]2 _? ~ a, ,~ ~ 2. (4.15)
i=1
= E[ (Cv l , ~i)2] _ 2(w 1, ~i) E[( C~I, ~i)] -~ E[( g})l, ~i)2]
<~ c,~ 2 + 2~7,,~ + ~ = (bli + Vi)2, (4.16)
1o 2 2 E II ~ - - ~1112 ~ ~ [(a. - - 1) 2 (bl, + W~)2 + a .~ 1. (4.17)
Obviously the bound in (4.16) could be improved to b~i + ~i 2, since the unbiasedness of ]21 causes the cross term to vanish. However, this calculation is used all over again in an inductive proof, and in general the preceding estimate h ~ is biased so the use of the Schwarz inequality is necessary.
The upper bound in (4.17) is minimized by putting
(bli + ~)2 i = 1 ..... p. (4.18) aii = (bli @ ~i)2 _~_ (yi 2 '
We let ~1 denote the estimate of ~1 given by (4.11) with aii as given by (4.18) and a,a = O, i @j . Since h 2 = h 1 @ (1, we take as our estimate o fh 2,
h2 ~ hi + ~1. (4.19)
It is perhaps interesting to note that the definition of h 2 in (4.19) is analogous to a key step in a standard derivation of the Kalman recursive least-mean- squared error estimate; the difference is that there the step is shown to lead to an optimum estimate by a projection argument, whereas here there is no such structure available. With the use of (4.17) a short calculation shows that the error in h 2 satisfies the following inequality,
EI] h~ - - h21] 2 = E II(h ~ + ~1) _ (hi + ~l)ll2
= E [l ~1 - - ~ II 2
i=l i=1
= X b~,, (4.20)
SLOWLY VARYING SYSTEMS 219
where
b~ A (b~ + ~;)~ ~? (bli + Vi) 2 + ~P "
Repetition of the procedure just described leads to the recursive estimate
h~, = hn--1 _1__ ~n--1
~=1 (b(n-1)i + ~i)2 + ai 2
b~ = (b("-l)i + ~)2 ~2 i = 1 ..... p (4.21) (b<._a)~ + n~) ~ + ~C" '
with initial conditions given by
h 1 ~ Czl~
b2i = Gz2, i = 1 .... ,p. (4.22)
THEOREM 2. Given the statistical model defined by Eqs. (4.1) and subject to the conditions of (4.2) and (4.4) and to ~A/'(X) = O. The recursive estimates defined by Eqs. (4.21) and (4.22) satisfy the error bound
23 E [I h" - h" ii z <~ ~ b~. (4.23)
Proof. At the kth stage the model (4.1) gives the observation equation
z k =Xh ~+v ~=Xhk- l+Xw ~- l+v k,
which can be rewritten as 0,~ = X~k-1 + v k,
where O k a= z~ _ Xhl. 1 and
~7~-i ~ (h~-i __ hk-i) _}_ wk-i = h k __ hk-i.
From these relations Eqs. (4.21) follow for n = k exactly as they did for n = 2 and the calculation of the accompanying error bound (4.23) also goes the same way. The assertion follows by induction. |
The behavior of the error bound in (4.23) as n --+ oo is of concern. Before presenting a result for this limiting case we state without proof a simple preparatory lemma (see Fiske, 1975b).
220
LEMMA 4. and satisfies
(i) lim~_~0+f(x ) > 0,
(ii) lim~_~ f (x) = constant < ~,
(iii) i f (x) > O, x > O,
(iv) f"(x) < O, x > o, then~
(a) (b)
FISKE AND ROOT
I f f is a real-valued C ~ function defined on the positive half-line
f (x ) = x has a unique solution x* for x > O,
the sequence xn = f(x~_z), xl > O, converges to x*.
THEOREM 3. The equation
(xl/2 + ~i)2 °l~ = x, x > 0, (4.24) (xll ~ + m) 2 + c~ ~
with ~ll, ai > O, has a unique solution xi*. For the recursive estimates of Theorem 2, l im,~ b~i = xi*.
Proof. Follows from Lemma 4 and (4.21). I
A graph of xi* is shown for certain values of ~7i and a i in Fig. J.
Upper Bound (Recurslve)
/ / /
%- I / /Lower Bound 0.2I / / {X=I,R Diogonal)
oov 0 0.1 0.2 0.3 0.4 0.5
~/i/°'i
FIc. 1. Upper and lower bounds on MSE (ith component).
SLOWLY VARYING SYSTEMS 221
EXAMPLE. For a system with input-output relation charaeterizable by a linear integral operator, or more generally by a polynomial integral operator (Volterra polynomial), there is an easy and obvious way to get a linear- in-the-parameters model of the form of (4.1) to which Theorem 2 applies. We indicate how this goes for a second-degree polynomial, and then consider details in a simple special case. It will be clear how this procedure generalizes to higher degree polynomials. Two comments should be made, even though they are obvious. First, some approximation is necessary to reduce the model to a finite-dimensional one; second, this is an example of an ad hoc system model and has nothing to do directly with e-representations.
Suppose the system is described by
~r+T er (r+T) y(t) = ,-m hl(t , s) x(s) ds + "~t[['r-m) h2(t' sl' s~,) x(sl) x(s2) ds 1 ds2,
r~t~r+T, r~R 1, (4.25)
z(t) = y(t) + v(t), (4.26)
where the kernels h 1 and h~ are square-integrable in all their arguments over any bounded set; where x is square integrable on any finite interval; where rn ~ 0, T > 0 are arbitrary but fixed, and where v(t) is a wide-sense stationary stochastic process with mean zero. The pair (hi, h2) represent the system, x is the input, v is noise, and z is the observed output. T is the duration of an observation interval and m is the maximum duration of the finite memory.
Let r be temporarily set equal to zero; let {¢i} and {~i} be complete orthonormal systems (c.o.n.s.) for L~(--m, T) and L2(0, T), respectively. Expansion of both sides of (4.25) with respect to these c.o.n.s, yields
cz = ~ b~,a i + ~ b~ilqa,fli2, k = 1, 2 ..... (4.27) i il,i ~
where
. . = (x, ¢ . ) ,
T T
T T T b/cil'/] = fO ~m ~m h2(t' Sl, $2)¢i1(31)¢~:2($2)~k(t) dsl ds~. dt.
222 FISKE AND ROOT
Since it may be assumed without loss of generality that h 2 is symmetric in s 1 and s2, it may be assumed that b~hi, = bki~q. Under rather obvious compactness conditions on the class of systems and class of inputs in question (e.g., if h 1 , h~, and x are required to belong to compact subsets of the appropriate L l spaces) Eqs. (4.27) can be approximated uniformly in an L ~ sense by a finite system of equations in finitely many unknowns:
M M ck = E bk~a' q- E bk,,,,ailai=, k = I .... , K. (4.28)
i=1 i1,~2=1
These equations can be written in the form of the linear vector-matrix equation y = Xh, where h is the vector of the b's, y is the vector of the c's, and X is determined by the a's.
Equations (4.28) correspond to one measurement, as the term is defined above, at ~- = t 1 = 0. Successive measurements can be made independently at t~ ~ m q-T , t a = 2(m q-T) , etc. With the c.o.n.s, translated appro- priately, each measurement will correspond to a system of equations (4.28) with the input for that measurement determining the a's and the condition of the system during that measurement determining the b's. I f the b's do not change for a sufficient number of measurements, input signals can always be chosen so that the b's can be uniquely determined (see Fiske, 1975b).
Now we suppose that the system in question is causal, and we temporarily also assume it to be time-invariant. The ¢~ and ~/~ can then be chosen to take advantage of these conditions. In particular, define ¢1 ,..., eM by
( m ~1[2 (i - - 1)(T "4- m) i (T + m) ¢~(t) = \T - - -~] ' - -m+ M ~<t<- -m+ M '
= 0, for all other t ~ [--m, T].
The completion of the system {¢i} is irrelevant. There is no harm done if m is increased if necessary so that it is exactly divisible by 8 = (T + m)/M. The ~7i can now be defined to be functions consisting of a simple step of length 8, just as are the ¢i. However, because of the causality, bounded memory and time-invariance there is nothing lost by stopping the observation interval at 8 and defining only ~71 as a step of height (1/8)1/~ on [0, 8].
Thus, T=3, K= 1, M=(m/8)+l , and Eqs. (4.28) reduce to the single equation
M M
C --~ 2 biai ~- 2 bijaiaj, (4.29) i=1 i,3=1
SLOWLY VARYING SYSTEMS 223
where
c -- ~1/~ y(t) dt,
b4 = ~ -~+(~-1)~ fo h(t, s) dt ds,
1 f-m+j f~h( t , s l ,s2) dtds lds~ " b4j = ~ --fa+(4--1)~ --'m+(J--1)~ JO
It will be observed that, since b~j = b~i, there are N = ½M(M + 3) un- known b's in (4.29). We therefore consider a block of N successive measure- ments with N different inputs (i.e., N different sets of the ai) chosen so that the b's are uniquely determined. Using a superscript to denote the number of the measurement, we then have from (4.29)
M M c '~=~b,a#+ ~ b4,a4~a/~, n=l .... ,N . (4.30)
4=1 z,]=l
Equations (4.30) can be rewritten in vector-matrix form,
y =- Xh, (4.31)
where
y ~--~ [c 1, C 2 ..... cN] T,
h = [b 1 , b n ; bz, b2~ ;...; bM, bMM ; 2bn, 2bla .... ,2h im ; 2b2a ,...; 2bM_I,M] T,
and a 1 a 1 ] a l 1, (al l) 2, a21, (a21) 2,-.-, (a2) 2, alia21, ..., M-1 M
X ~ ... o , ' " , aN N ! al N, (alN) 2 .... , alNa2 N M_laM_I ]
From (4.26) and (4.31), there follows the equation
z = Xh + v, (4.32)
where v = Iv 1 ..... v~] r, and v 4 is equal to 3-I/2 times the integral of the noise over the observation interval of the ith measurement. Ev 4 = O, E(v4)2 = constant = a 2, say; and we shall assume E(v4vj) = 0 for i =~j. Thus, E(vv r) = a2I.
I f it is now presumed that h (and hence the b's) changes somewhat from one block of N measurements to the next, and if the same cycle of inputs
224 FISKE AND ROOT
is used for each block of measurements, we recover Eqs. (4.1), where the superscript n denotes the number of the block of measurements. Theorem 2 applies with R = a2I and
C = (XrR-1X) -1 XTR -1 = X-L
As an illustration, consider the simplest nontrivial case with M = 2. Then N = 5 so that five measurements to a block are necessary. The inputs may be chosen so that
[ i l l 0 0 i l - - 1 0 0
X= 0 1 1 . 0 - -1 1 1 1 1
Then,
C =X -1 =
_2o2 o oO 1 0 2 --2 . 0 2 2
4 0 --4 0
The eigenvalues and corresponding eigenvectors of (XrR-1X) -1 = 0-2(XTX) - I are
0-13 = 3.33~ ~, 0-2 2 = 10.2
0.32 = ½0-2,
0"43 ~--- 10-2 ,
%3 _ 0.150-3,
¢1 = [0.166, 0.166, 0.166, 0.166, --0.944] r,
¢5 = (1/21/~)[-1, 1, 0, 0, 0y , ~z = (1/21/2)[ 0, 0, --1, 1, 0] r,
¢, = ½[1, 1, --1, --1, 0] r,
¢5 = [0.474, 0.474, 0.474, 0.474, 0.331] r.
The ~/i remain to be chosen. If limits on the possible variation of (b 1 , bn, b2, b~2,2b12 ) between blocks are taken to be (0 a , 02 , 08 , 0a, 05), then the ~/i should be
~/i = max [%, %, %, %, o~5]T [¢i],
In any given situation the choice of the ~h will probably be something of a guess. If they are chosen conservatively (too large), then the estimates will not be quite as good as they might be; if they are chosen too small the error bounds will be incorrect. Suppose in this example the ratio ~1i/0- =
SLOWLY VARY ING SYSTEMS 225
0.5. Then ~i/171 = 0.28, and from Fig. 1 one sees that the upper bound on the final mean-squared error in the ¢l-component is approximately 0.5a1~ = 1.7a 2, as contrasted with a12= 3.33a 2, the error given by the LUMV estimator.
Lower Bound on Maximum Estimation Error
As previously mentioned, since the error bound just derived is presumably not the best possible, it is important to establish a lower bound on upper bounds to mean-squared error, for comparison. This is done by a contradic- tion argument using standard results from the Kalman linear least-mean- squared error estimation theory. The result is for linear estimators only.
Suppose that all the conditions for Theorem 2 are satisfied, and in addition each w ~ is a stochastic vector satisfying the conditions
P[( w~, ~) = ~d = 1/2 = P[(w ~, ~) = --~i],
E[(w '~, ~)(w ~, ~j) = ~i2~j8~ , (4.33)
E[(w ~, f ) (v m, g)] = 0,
for i, j = 1,..., p, all m and n, and all f and g ~ R ~ in the last equation. Note that these additional conditions in no way violate the original ones. Then Ew ~ = 0 and w ~ has covariance Q satisfying
(Q~bi, ~b~.) = ~8~j . (4.34)
The equations (4.1) can now be interpreted as representing a controllable, observable linear dynamical system model with uncorrelated state and observation noise. To apply the linear LMS recursive estimation theory to (4.1) it is necessary to assume that h 1 is a stochastic vector with known mean and covariance. It is convenient to take
Eh 1 = Cz 1 .~ h 1,
cov(h 1 - - Eh ~) ~ PI = (XTR-~X)-h (4.35)
The error covariance matrix for the LMS est imate/~ of h ~ given the observa- tions zZ,..., z ~ and initial data as in (4.35) satisfies the recurrence equation
P,~ = (P,_a -~ Q)(I - xr[x(P~_~ ~- Q) x r @ R] -~- X(P~_~ q- Q)},
n = 2, 3 ..... (4.36)
(See, e.g., Astr6m, 1970, with P1 given by (4.35).)
LEM~A 5. The sequence {Tr P,~}, n = 1, 2,..., with P,, given by (4.35) and (4.36) is monotonically nonincreasing.
226 FISKE AND ROOT
Proof. We note first that Tr P2 ~< Tr P1. In fact, given the data {z2; hi, P1}, Tr P2 is the mean-squared error of the LMS estimate h 2. But the LUMV estimate of h ~ has mean-squared error Tr[(XrR-1X)-I] = Tr P1. Since this is another linear estimate based on (some of) the same data it cannot have smaller mean-squared error than the LMS estimate. Hence Tr P~ ~ Tr P1.
Now observe that because of the stationarity of the statistics in the model, the LMS estimate of h n given {z 2 ..... z~; hi, P1} has the same error covariance matrix, P~, as does the LMS estimate ofh n+l given {z 3 ..... z~*+l; it~ = Cz2, P1}. Thus, given {z2,..., zn+1; h I, P1}, one can form a (nonoptimum) linear estimate of h ~+1 by first computing the LUMV estimate ]z 2 = Cz 2, which has error covariance matrix P~, and then replacing /~2 in the recursive LMS algorithm by h 2. This is equivalent to finding the LMS estimate of h n+l given {zz,..., zn+l; ]~2, P1}. By the remark just made, the mean-squared error of this estimate is Tr P~, which must then be greater than or equal to the minimum mean-squared error Tr P~+z. |
Let e ~ lim~oo Tr P~+I, then e is a lower bound on uniform upper bounds on mean-squared error. More precisely, we have:
THEOREM 4. Let the conditions of Theorem 2 be satisfied. Then for any linear estimator of the system parameters h n and for any m ~ 1 there exists an admissible sequence {hn}, n = l, 2,..., m such that the mean-squared error in the estimation of the h ~ is greater than or equal to e.
Proof. We are free to suppose that h 1 is a stochastic vector with covariance Pz and that the w ~ are stochastic vectors satisfying (4.33) and bounded as required. Denote an arbitrary linear estimate of h ~ by /~. Suppose that for some m ~ 1
E{II ~ - - h~ I12 I h 1, wI,.-., w ~-l} < e,
for every h 1 and every sequence {w 1 ..... w m-l} meeting the conditions just imposed. Then
g I1 h~ - h ~ II 2 = E[E{Jr h~ - - h ~ ]]2 ] h 1, wl , . . . , w~- l} ] .
But by Lemma 5 the LMS estimates of the h ~ for the given conditions all have mean-squared error Tr(P,,) /> e. Thus, by the minimum mean-squared error property of the LMS estimates there is a contradiction. |
For the very special case that X is the identity this lower upper bound can easily be computed. We compute it here because it gives an immediate
SLOWLY VARYING SYSTEMS 227
comparison with the upper bound of Theorem 2, and also because, simple as it is, it is the case of interest in the general theorem of the last section.
With X = I the ¢ i , i = 1 .... ,p, are orthonormal eigenvectors of R. Thus, if they are chosen as a basis for a vector-matrix formulation, R is the matrix diag[a12 ..... %2], and from (4.34) Q is the matrix diag[~?le,..., %2]. Furthermore, (4.36) then becomes, after a small amount of manipulation,
P~ = (P,~_~ + Q)(P~_~ + Q + R)-~R, n = 2, 3 ..... (4.37)
Since/)1 = R is diagonal, each P~ as given by (4.37) is diagonal. In particular, if p,~_ 1 diag[fl~_m, ~ 2 ~ 2 ,..., fi(~-l)~], then P~ diag[fi~l ]3~ 2 = ~] , = ]~(n--1)2 , ~'" ", where
2 2 2 = ~ ~=-~2 , i = 1 .... ,p. (4.38)
P(n-1)i ~i ~- i
It follows from Lemma 4 that the fi~, converge monotonically to the fixed point of the function
(X @ 7]i 2) O'i 2 X ~ 0, f (x ) = x -+- 7Jig -~ (Yl 2 '
which is
Thus,
0.2 1/2 --.~vi + 1 + 4 ~qi ° (4.39)
P
e = lim Tr(Pn) = ~ /5i 2. (4.40) n~oo
See Fig. 1 for a comparison with the upper error bound for the recursive estimator.
5. IDENTIFICATION OF A SLOWLY VARYING SYSTEM
WITH NOISY OBSERVATIONS OF OUTPUT
As previously indicated, there is really no way to distinguish between h ~ and e n in (3.11) in estimates based on z n. So, in analogy to what was done in the noise-free case, we estimate h ~ + e ~ and simply allow for the error possibly introduced by E n when the estimate obtained is used as an estimate of h n. From (3.7) yn = h,~ + En, so (3.11) can be rewritten in the form
y~+l = y~ + V~, (5.1) z n = yn + v s.
228 FISKE AND ROOT
Since [1 r ~ [] = I](Y ~+1 -- hn+l) + ( hn+l - - hn) + ( h~ -- Yn)l] it follows from Lemma 3 that if lI wn ][ < ~/then
II ~" I1 ~< 2(N -- 1)~/+ N~ = d, (5.2)
where d is defined-by (5.2). We define an estimate for ]z ~ of h n in (3.11) to be the estimate ~" of y~ in (5.1) specified as in Theorem 2. The following theorem is an extension of Theorem 1 that accounts for the statistical errors in the system parameter estimates when the output observations are noisy.
THEOREM 5. Let (~, H a, Y£~) be an extended system satisfying all the conditions of Theorem 1. Suppose that the output for each measurement is observed in the presence of additive noise V~, as in (3.9), and that the Hilbert- space-valued random variables V~ satisfy the conditions (a), (fi), (7)following (3.9). Fix ~ > 0 arbitrarily and let N be as in Theorem 1. Then, given any ~' > O, there exists an estimate I:I~ of H~, for any k, formed from a finite sequence of blocks of N measurements such that
tl ~q- - - H . l[ ~< E + (X - - 1) N1/23 + ~, (5.3)
where ~ is a nonnegative random variable satisfying
MN
E~ ~ <~ ~ Bt 2 + d. i=1
The length of the sequence of blocks depends on #. Each constant B i iS the unique nonnegative solution of
(B~/2 + d) ~ ai ~ (B 1/~ -k d) ~ + a~ 2 = Bi , i = 1,..., MN, (5.4)
i
where the ~2, i = 1 .... , MN, are the eigenvectors of ~F~r (each ai 2 is repeated N times), and where
d = 2(N - - 1) Na/23 + N*/2& (5.5)
Proof. We maintain the convention that superscripts refer to blocks of measurements. Put /~"= ~bl(h" ) = ~bl(#" ). From Theorem 1 it follows that if the estimate of 3~ n is perfect, /]r~ = Hn, where H n satisfies (3.8). From Lemma 2 it follows that if ]1 h n - - yn [[ ~ ~, then
II-'q'~ - Hn II ~< [I B '~ - H '~ It + II Er- - - H,~ II
~< ~ + E + (N - - 1) N1/~,
SLOWLY VARYING SYSTEMS 229
since $1(h n -- yn) =/4~ - - H% This much is true no matter what estimate is used for hn; however, consider h n as defined above. From Theorem 2
~9 E I1 h- - h~ lj ~ ~< Z b~,,
where now p ~ MN, and where the b~i are defined recursively by (4.21) and (4.22). Since X -~/ , the ai ~ in these equations are now simply the eigenvalues of the covariance operator R = ~r_P~r. Since (5.2) is to hold, each ~?i, i = 1 .... ,p , is replaced by d, which, in terms of S is as in (5.5). The assertion then follows from Theorem 3 for H n, which corresponds to the nNth measurement. The conclusion also follows for any H,~ ; one can simply delay the beginning of the first block of measurements by the appropriate amount so that n = kN for some k. |
RECEIVED: May 30, 1975; REVISED: January 19, 1976
REFERENCES
ALBERT, A. (1972), "Regression and the Moore-Penrose Pseudoinverse," pp. 86-93, Academic Press, New York.
ASTR~SM, K. J. (1970), "Introduction to Stochastic Control Theory," pp. 225-237, Academic Press, New York.
BARVCHA-REIo, A. T. (1972), "Random Integral Equations," pp. 14-16, Academic Press, New York.
DIEUDONNI}, J. A. (1960), "Foundations of Modern Analysis," pp. 126-129, Academic Press, New York.
FISKE, P. H., ANn ROOT, W. L. (1974), Estimation of parameters in systems of relatively slow time variation, in "Proceedings of the Twelfth Annual Allerton Conference on Circuit and System Theory," pp. 1039-2047, Univ. of Illinois Press, Urbana, Illinois.
FISKE, P. H. (1975a), Modeling and identification of time-varying systems from noisy observations, in "Proceedings of the Johns-Hopkins Conference on Information Science and Systems," The Johns Hopkins University, Baltimore.
FISKE, P. H. (1975b), "Modeling and Identification of Relatively Slowly Varying Systems," Ph.D. Thesis, The University of Michigan.
ROOT, W. L. (1971), Some general structure theory of systems to be used in identifica- tion and measurement, in "Proceedings of the Fifth Annual Princeton Conference on Information Sciences and Systems, pp. 23-19, Princeton University, Princeton, N. J.
RooT, W. L. (1973), On the modelling and estimation of communication channels, in "Multivariate Analysis, III" (Krishnaiah, Ed.), pp. 61-78, Academic Press, New York.
230 F~SKS AND ROOT
ROOT, W. L. (1975a), On the modeling of systems for identification, I: C-Representa- tions of classes of systems, S IAM J. Control 13, 927-944.
ROOT, W. L. (1975b), On the modeling of systems for identification, II: Time-varying systems, S IAM J. Control 13, 945-974.
SCHWEPPE, F. C. (1973), "Uncertain Dynamic Systems," pp. 154-160, Prentice-Hall, Englewood Cliffs, N. J.
Comments







