18 Şubat 2011 Cuma

Principal component analysis

Definition and examples from US Treasury market
As often occurs in finance, an analogy with physical systems suggests an approach.
Observed shifts in the yield curve may seem complex and somewhat chaotic. In
principle, it might seem that any point on the yield curve can move independently in
a random fashion. However, it turns out that most of the observed fluctuation in
yields can be explained by more systematic yield shifts: that is, bond yields moving
‘together’, in a correlated fashion, but perhaps in several different ways. Thus, one
should not focus on fluctuations at individual points on the yield curve, but on shifts
that apply to the yield curve as a whole. It is possible to identify these systematic
shifts by an appropriate statistical analysis; as often occurs in finance, one can apply
techniques inspired by the study of physical systems.
The following concrete example, taken from Jennings and McKeown (1992), may
be helpful. Consider a plank with one end fixed to a wall. Whenever the plank is
knocked, it will vibrate. Furthermore, when it vibrates it does not deform in a
completely random way, but has only a few ‘vibration modes’ corresponding to its
natural frequencies. These vibration modes have different degrees of importance,
with one mode – a simple back-and-forth motion – dominating the others: see
Figure 5.2.
One can derive these vibration modes mathematically, if one knows the precise
physical characteristics of the plank. But one should also be able to determine them
empirically by observing the plank. To do this, one attaches motion sensors at
different points on the plank, to track the motion of these points through time. One
will find that the observed disturbances at each point are correlated. It is possible to
extract the vibration modes, and their relative importance, from the correlation
matrix. In fact, the vibration modes correspond to the eigenvalues of the matrix: in
other words, the eigenvectors, plotted in graphical form, will turn out to look exactly
as in Figure 5.2. The relative importance of each vibration mode is measured by the
size of the corresponding eigenvectors.
Let us recall the definitions. Let A be a matrix. We say that v is an eigenvector of
A, with corresponding eigenvalue j, if A.vójv. The eigenvalues of a matrix must be
mutually orthogonal, i.e. ‘independent’. Note that eigenvectors are only defined up to
a scalar multiple, but that eigenvalues are uniquely defined.
Suppose A is a correlation matrix, e.g. derived from some time series of data; then
it must be symmetric and also positive definite (i.e. v.A.v[0 for all vectors v). One
can show that all the eigenvalues of such a matrix must be real and positive. In this
case it makes sense to compare their relative sizes, and to regard them as ‘weights’
which measure the importance of the corresponding eigenvectors.
For a physical system such as the cantilever, the interpretation is as follows. The eigenvectors describe the independent vibration modes: each eigenvector has one
component for each sensor, and the component is a (positive or negative) real number
which describes the relative displacement of that sensor under the given vibration
mode. The corresponding eigenvalue measures how much of the observed motion of
the plank can be attributed to that specific vibration mode.
This suggests that we can analyze yield curve shifts analogously, as follows. Fix a
set of reference maturities for which reasonably long time series of, say, daily yields
are available: each reference maturity on the yield curve is the analog of a motion
sensor on the plank. Construct the time series of daily changes in yield at each
reference maturity, and compute the correlation matrix. Next, compute the eigenvectors
and eigenvalues of this matrix. The eigenvectors can then be interpreted as
independent ‘fundamental yield curve shifts’, analogous to vibration modes; in other
words, the actual change in the yield curve on any particular day may be regarded
as a combination of different, independent, fundamental yield curve shifts. The
relative sizes of the eigenvalues tells us which fundamental yield curve shifts tend to
dominate.

For a toy example, see Table 5.1. The imaginary data set consists of five days of
observed daily yield changes at four unnamed reference maturities; for example, on
days 1 and 3 a perfectly parallel shift occurred. The correlation matrix shows that
yield shifts at different maturity points are quite correlated. Inspecting the eigenvalues
and eigenvectors shows that, at least according to principal component
analysis, there is a dominant yield curve shift, eigenvector (D), which represents an
almost parallel shift: each maturity point moves by about 0.5. The second most
important eigenvector (C) seems to represent a slope shift or ‘yield curve tilt’. The
third eigenvector (B) seems to appear because of the inclusion of day 5 in the data set.
Note that the results might not perfectly reflect one’s intuition. First, the dominant
shift (D) is not perfectly parallel, even though two perfectly parallel shifts were
included in the data set. Second, the shift that occurred on day 2 is regarded as a
combination of a parallel shift (D) and a slope shift (C), not a slope shift alone; shift
(C) has almost the same shape as the observed shift on day 2, but it has been
‘translated’ so that shifts of type (C) are uncorrelated with shifts of type (D). Third,
eigenvector (A) seems to have no interpretation. Finally, the weight attached to (D)
seems very high – this is because the actual shifts on all five days are regarded as
having a parallel component, as we just noted.
A technical point: in theory, one could use the covariance matrix rather than the
correlation matrix in the analysis. However using the correlation matrix is preferable
when observed correlations are more stable than observed covariances – which is
usually the case in financial data where volatilities are quite unstable. (For further
discussion, see Buhler and Zimmermann, 1996.) In the example of Table 5.1, very
similar results are obtained using the covariance matrix.
Table 5.2 shows the result of a principal component analysis carried out on actual
US Treasury bond yield data from 1993 to 1998. In this case the dominant shift is a
virtually parallel shift, which explains over 90% of observed fluctuations in bond
yields. The second most important shift is a slope shift or tilt in which short yields
fall and long yields rise (or vice versa). The third shift is a kind of curvature shift, in which short and long yields rise while mid-range yields fall (or vice versa); the
remaining eigenvectors have no meaningful interpretation and are statistically
insignificant.
Note that meaningful results will only be obtained if a consistent set of yields is
used: in this case, constant maturity Treasury yields regarded as a proxy for a
Treasury par yield curve. Yields on physical bonds should not be used, since the
population of bonds both ages and changes composition over time. The analysis here
has been carried out using CMT yields reported by the US Federal Reserve Bank.
An alternative is to use a dataset consisting of historical swap rates, which are par
yields by definition. The results of the analysis turn out to be very similar.

Hiç yorum yok:

Yorum Gönder