Funciones Ortogonales Empíricas (EOF)

EOF vs Componentes Principales

En esencia las EOF son componentes principales calculadas sobre una matriz cada una de cuyas columnas es una serie temporal, correspondiendo todas las series temporales a medidas de la misma variable realizadas normalmente en un sitio distinto.

En el primer ejemplo que veremos a continuación las EOF se emplean para describir la evolucion de la concentración de la clorofila en la bahía de San Francisco desde el año 1978 hasta el 2009, a partir de los datos recogidos por 16 estaciones de medida (boyas) situadas en la bahía. Los datos en este caso son una matriz de valores de clorofila, cada una de cuyas filas es un mes y cada columna una estación de medida.

EOF con el paquete wql: Análisis de la clorofila en la Bahía de San Francisco

library(wql)

La base de datos contiene los valores medios mensuales de clorofila medidos en 16 estaciones situadas en la Bahía de San Francisco desde enero de 1978 hasta agosto del 2009:

chla <- sfbayChla
time(chla) # Instantes de las observaciones
          Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
1978 1978.000 1978.083 1978.167 1978.250 1978.333 1978.417 1978.500 1978.583
1979 1979.000 1979.083 1979.167 1979.250 1979.333 1979.417 1979.500 1979.583
1980 1980.000 1980.083 1980.167 1980.250 1980.333 1980.417 1980.500 1980.583
1981 1981.000 1981.083 1981.167 1981.250 1981.333 1981.417 1981.500 1981.583
1982 1982.000 1982.083 1982.167 1982.250 1982.333 1982.417 1982.500 1982.583
1983 1983.000 1983.083 1983.167 1983.250 1983.333 1983.417 1983.500 1983.583
1984 1984.000 1984.083 1984.167 1984.250 1984.333 1984.417 1984.500 1984.583
1985 1985.000 1985.083 1985.167 1985.250 1985.333 1985.417 1985.500 1985.583
1986 1986.000 1986.083 1986.167 1986.250 1986.333 1986.417 1986.500 1986.583
1987 1987.000 1987.083 1987.167 1987.250 1987.333 1987.417 1987.500 1987.583
1988 1988.000 1988.083 1988.167 1988.250 1988.333 1988.417 1988.500 1988.583
1989 1989.000 1989.083 1989.167 1989.250 1989.333 1989.417 1989.500 1989.583
1990 1990.000 1990.083 1990.167 1990.250 1990.333 1990.417 1990.500 1990.583
1991 1991.000 1991.083 1991.167 1991.250 1991.333 1991.417 1991.500 1991.583
1992 1992.000 1992.083 1992.167 1992.250 1992.333 1992.417 1992.500 1992.583
1993 1993.000 1993.083 1993.167 1993.250 1993.333 1993.417 1993.500 1993.583
1994 1994.000 1994.083 1994.167 1994.250 1994.333 1994.417 1994.500 1994.583
1995 1995.000 1995.083 1995.167 1995.250 1995.333 1995.417 1995.500 1995.583
1996 1996.000 1996.083 1996.167 1996.250 1996.333 1996.417 1996.500 1996.583
1997 1997.000 1997.083 1997.167 1997.250 1997.333 1997.417 1997.500 1997.583
1998 1998.000 1998.083 1998.167 1998.250 1998.333 1998.417 1998.500 1998.583
1999 1999.000 1999.083 1999.167 1999.250 1999.333 1999.417 1999.500 1999.583
2000 2000.000 2000.083 2000.167 2000.250 2000.333 2000.417 2000.500 2000.583
2001 2001.000 2001.083 2001.167 2001.250 2001.333 2001.417 2001.500 2001.583
2002 2002.000 2002.083 2002.167 2002.250 2002.333 2002.417 2002.500 2002.583
2003 2003.000 2003.083 2003.167 2003.250 2003.333 2003.417 2003.500 2003.583
2004 2004.000 2004.083 2004.167 2004.250 2004.333 2004.417 2004.500 2004.583
2005 2005.000 2005.083 2005.167 2005.250 2005.333 2005.417 2005.500 2005.583
2006 2006.000 2006.083 2006.167 2006.250 2006.333 2006.417 2006.500 2006.583
2007 2007.000 2007.083 2007.167 2007.250 2007.333 2007.417 2007.500 2007.583
2008 2008.000 2008.083 2008.167 2008.250 2008.333 2008.417 2008.500 2008.583
2009 2009.000 2009.083 2009.167 2009.250 2009.333 2009.417 2009.500 2009.583
          Sep      Oct      Nov      Dec
1978 1978.667 1978.750 1978.833 1978.917
1979 1979.667 1979.750 1979.833 1979.917
1980 1980.667 1980.750 1980.833 1980.917
1981 1981.667 1981.750 1981.833 1981.917
1982 1982.667 1982.750 1982.833 1982.917
1983 1983.667 1983.750 1983.833 1983.917
1984 1984.667 1984.750 1984.833 1984.917
1985 1985.667 1985.750 1985.833 1985.917
1986 1986.667 1986.750 1986.833 1986.917
1987 1987.667 1987.750 1987.833 1987.917
1988 1988.667 1988.750 1988.833 1988.917
1989 1989.667 1989.750 1989.833 1989.917
1990 1990.667 1990.750 1990.833 1990.917
1991 1991.667 1991.750 1991.833 1991.917
1992 1992.667 1992.750 1992.833 1992.917
1993 1993.667 1993.750 1993.833 1993.917
1994 1994.667 1994.750 1994.833 1994.917
1995 1995.667 1995.750 1995.833 1995.917
1996 1996.667 1996.750 1996.833 1996.917
1997 1997.667 1997.750 1997.833 1997.917
1998 1998.667 1998.750 1998.833 1998.917
1999 1999.667 1999.750 1999.833 1999.917
2000 2000.667 2000.750 2000.833 2000.917
2001 2001.667 2001.750 2001.833 2001.917
2002 2002.667 2002.750 2002.833 2002.917
2003 2003.667 2003.750 2003.833 2003.917
2004 2004.667 2004.750 2004.833 2004.917
2005 2005.667 2005.750 2005.833 2005.917
2006 2006.667 2006.750 2006.833 2006.917
2007 2007.667 2007.750 2007.833 2007.917
2008 2008.667 2008.750 2008.833 2008.917
2009                                    

Gráficos de algunas de estas series:

library(ggfortify)
Loading required package: ggplot2
autoplot(chla,ncol=4)
Warning: Removed 8 rows containing missing values (`geom_line()`).

Nos interesa la evolución anual; promediamos para cada año:

# Create an annual matrix time series
chla1 <- aggregate(sfbayChla, 1, mean, na.rm = TRUE)
autoplot(chla1,ncol=4)

Las últimas estaciones tienen valores perdidos; las descartamos:

chla1 <- chla1[, 1:12]  # remove stations with missing years

Elegimos el número adecuado de EOFs:

eofNum(chla1)

Construimos las EOF, seleccionando solo 1 tal como sugiere eofNum:

# eofNum (see examples) suggests n = 1
EOF <- eof(chla1, 1)

Representación gráfica de la EOF:

plot(as.numeric(rownames(EOF$amplitude)),EOF$amplitude,type="l",
     xlab="Year",ylab="Amplitude 1st mode")

Se podía haber obtenido el mismo resultado con prcomp (de hecho EOF usa prcomp): aquí vemos como coinciden varianzas y sus porcentajes acumulados:

pc <- prcomp(chla1,scale. = TRUE)
summary(pc)
Importance of components:
                          PC1     PC2    PC3     PC4     PC5     PC6     PC7
Standard deviation     3.0663 0.98683 0.9191 0.64721 0.40521 0.24946 0.18941
Proportion of Variance 0.7835 0.08115 0.0704 0.03491 0.01368 0.00519 0.00299
Cumulative Proportion  0.7835 0.86468 0.9351 0.96999 0.98367 0.98886 0.99185
                           PC8     PC9    PC10   PC11    PC12
Standard deviation     0.18700 0.15382 0.14163 0.1147 0.07751
Proportion of Variance 0.00291 0.00197 0.00167 0.0011 0.00050
Cumulative Proportion  0.99476 0.99673 0.99840 0.9995 1.00000
EOF$variance
 [1]  78.4  86.5  93.5  97.0  98.4  98.9  99.2  99.5  99.7  99.8  99.9 100.0
EOF$eigen.pct
 [1] 78.4  8.1  7.0  3.5  1.4  0.5  0.3  0.3  0.2  0.2  0.1  0.1

La función eofNum es básicamente lo mismo que eofNum:

screeplot(pc)

eofNum(chla1)

Las coordenadas de la EOF son una versión escalada a media 0 y desviación típica 1 de la primera componente principal:

pe <- data.frame(pc1=pc$x[,1],scaledPC1=scale(pc$x[,1],scale = TRUE),eof=EOF$amplitude)
pe$quotient=pe$scaledPC1/pe$pc1
pe
             pc1   scaledPC1        EOF1  quotient
1978 -3.71779761 -1.21246216 -1.21246216 0.3261238
1979 -3.31653011 -1.08159929 -1.08159929 0.3261238
1980 -3.66943342 -1.19668945 -1.19668945 0.3261238
1981 -2.94304599 -0.95979724 -0.95979724 0.3261238
1982 -2.72889938 -0.88995894 -0.88995894 0.3261238
1983  0.05732382  0.01869466  0.01869466 0.3261238
1984 -2.02038749 -0.65889638 -0.65889638 0.3261238
1985 -1.89260439 -0.61722327 -0.61722327 0.3261238
1986 -0.30543129 -0.09960840 -0.09960840 0.3261238
1987 -4.18310354 -1.36420948 -1.36420948 0.3261238
1988 -2.38621346 -0.77820092 -0.77820092 0.3261238
1989 -1.07971835 -0.35212181 -0.35212181 0.3261238
1990 -0.90950909 -0.29661253 -0.29661253 0.3261238
1991 -3.05696910 -0.99695028 -0.99695028 0.3261238
1992 -2.71675623 -0.88599877 -0.88599877 0.3261238
1993 -0.64605278 -0.21069317 -0.21069317 0.3261238
1994 -2.17668147 -0.70986756 -0.70986756 0.3261238
1995  2.32949446  0.75970351  0.75970351 0.3261238
1996 -2.59126388 -0.84507273 -0.84507273 0.3261238
1997  0.86074181  0.28070836  0.28070836 0.3261238
1998  3.36503739  1.09741867  1.09741867 0.3261238
1999  2.70185298  0.88113847  0.88113847 0.3261238
2000  3.23687896  1.05562316  1.05562316 0.3261238
2001  2.78555990  0.90843729  0.90843729 0.3261238
2002  1.53489367  0.50056531  0.50056531 0.3261238
2003  5.42194986  1.76822671  1.76822671 0.3261238
2004  1.40560267  0.45840044  0.45840044 0.3261238
2005  0.58759133  0.19162750  0.19162750 0.3261238
2006  6.27913918  2.04777652  2.04777652 0.3261238
2007  3.92929848  1.28143762  1.28143762 0.3261238
2008  5.84503308  1.90620420  1.90620420 0.3261238
rt <- data.frame(pc1=pc$rotation[,1],eof=EOF$REOF)
rt$quotient <- rt$pc/rt$EOF1
rt
          pc1      EOF1  quotient
s21 0.2984840 0.9152475 0.3261238
s22 0.2875436 0.8817009 0.3261238
s23 0.3074099 0.9426172 0.3261238
s24 0.3038324 0.9316475 0.3261238
s25 0.3013699 0.9240967 0.3261238
s26 0.2686399 0.8237361 0.3261238
s27 0.3116476 0.9556114 0.3261238
s28 0.2791966 0.8561061 0.3261238
s29 0.3042674 0.9329812 0.3261238
s30 0.2931426 0.8988690 0.3261238
s31 0.2549798 0.7818497 0.3261238
s32 0.2445793 0.7499585 0.3261238

 

 

EOF con el paquete metR

library(metR)
# The Antarctic Oscillation is computed from the
# monthly geopotential height anomalies weighted by latitude.
library(data.table)
data(geopotential)
geopotential <- copy(geopotential)
geopotential[, gh.t.w := Anomaly(gh)*sqrt(cos(lat*pi/180)),
      by = .(lon, lat, month(date))]
head(geopotential)
    lon   lat lev       gh       date    gh.t.w
1:  0.0 -22.5 700 3163.839 1990-01-01 -3.824174
2:  2.5 -22.5 700 3162.516 1990-01-01 -3.591582
3:  5.0 -22.5 700 3162.226 1990-01-01 -2.862909
4:  7.5 -22.5 700 3162.323 1990-01-01 -2.403045
5: 10.0 -22.5 700 3163.097 1990-01-01 -2.067122
6: 12.5 -22.5 700 3164.516 1990-01-01 -1.844855

En esta base de datos hay 290304 observaciones correspondientes a 72 meses (6 años) de observación de una rejilla de 4032 puntos de (longitud,latitud) (72x4032 = 290304); es decir se han medido las variables level y gh (anomalías de altura geopotenciales) en todos los puntos de la malla durante 72 meses (6 años). La variable gh.t.w se ha calculado a partir de gh ponderando por latitud. Esta última variable es la que resulta de interés.

length(unique(geopotential[,date]))
[1] 72
nrow(unique(geopotential[,lon:lat]))
[1] 4032

Esta base de datos se puede reordenar de forma que tenga 4032 filas (una por cada posición en la rejilla) y 72 columnas (una por cada día), y aplicar prcomp para hallar las componentes principales (EOFs). La función EOF del paquete metR permite hacer este cálculo de manera eficiente sin tener que estar reformateando los datos para darles la forma de esta matriz:

eof <- EOF(gh.t.w ~ lat + lon | date, NULL, data = geopotential,
           B = 100, probs = c(low = 0.1, hig = 0.9))

Como vemos, por defecto salen 72 componentes principales:

# Inspect the explained variance of each component
summary(eof)
Importancia de las componentes:
Componente Varianza explicada Varianza acumulada
      1                   32%                 32%
      2                   11%                 43%
      3                    8%                 51%
      4                    7%                 58%
      5                    6%                 64%
      6                    5%                 69%
      7                    4%                 73%
      8                    4%                 77%
      9                    3%                 80%
     10                    3%                 83%
     11                    2%                 85%
     12                    2%                 87%
     13                    2%                 89%
     14                    1%                 90%
     15                    1%                 91%
     16                    1%                 92%
     17                    1%                 93%
     18                    1%                 94%
     19                    1%                 94%
     20                    1%                 95%
     21                    1%                 95%
     22                    0%                 96%
     23                    0%                 96%
     24                    0%                 97%
     25                    0%                 97%
     26                    0%                 97%
     27                    0%                 98%
     28                    0%                 98%
     29                    0%                 98%
     30                    0%                 98%
     31                    0%                 98%
     32                    0%                 99%
     33                    0%                 99%
     34                    0%                 99%
     35                    0%                 99%
     36                    0%                 99%
     37                    0%                 99%
     38                    0%                 99%
     39                    0%                 99%
     40                    0%                 99%
     41                    0%                 99%
     42                    0%                 99%
     43                    0%                100%
     44                    0%                100%
     45                    0%                100%
     46                    0%                100%
     47                    0%                100%
     48                    0%                100%
     49                    0%                100%
     50                    0%                100%
     51                    0%                100%
     52                    0%                100%
     53                    0%                100%
     54                    0%                100%
     55                    0%                100%
     56                    0%                100%
     57                    0%                100%
     58                    0%                100%
     59                    0%                100%
     60                    0%                100%
     61                    0%                100%
     62                    0%                100%
     63                    0%                100%
     64                    0%                100%
     65                    0%                100%
     66                    0%                100%
     67                    0%                100%
     68                    0%                100%
     69                    0%                100%
     70                    0%                100%
     71                    0%                100%
     72                    0%                100%
screeplot(eof)

Visto el screeplot, nos podemos quedar con las primeras 5 componentes, si bien la primera es la que más varianza acumula. Representamos los valores de esta componente frente a longitud y latitud:

# Keep only the 1st.
aao <- cut(eof, 1)
aao
left:
        lat   lon  PC        gh.t.w
   1: -22.5   0.0 PC1  2.318593e-04
   2: -22.5   2.5 PC1  4.667098e-06
   3: -22.5   5.0 PC1 -1.171197e-04
   4: -22.5   7.5 PC1 -1.472483e-04
   5: -22.5  10.0 PC1 -1.123333e-04
  ---                              
4028: -90.0 347.5 PC1  3.067086e-10
4029: -90.0 350.0 PC1  3.067086e-10
4030: -90.0 352.5 PC1  3.067086e-10
4031: -90.0 355.0 PC1  3.067086e-10
4032: -90.0 357.5 PC1  3.067086e-10

right:
          date  PC        gh.t.w
 1: 1990-01-01 PC1  0.0633150325
 2: 1990-02-01 PC1 -0.1145370630
 3: 1990-03-01 PC1 -0.0432131694
 4: 1990-04-01 PC1  0.1926354538
 5: 1990-05-01 PC1  0.1538555485
 6: 1990-06-01 PC1 -0.0893530078
 7: 1990-07-01 PC1  0.0405588938
 8: 1990-08-01 PC1  0.0092101751
 9: 1990-09-01 PC1 -0.1296209983
10: 1990-10-01 PC1 -0.0207790817
11: 1990-11-01 PC1  0.0284246832
12: 1990-12-01 PC1  0.0630467201
13: 1991-01-01 PC1 -0.0681743466
14: 1991-02-01 PC1  0.1043889123
15: 1991-03-01 PC1 -0.0488563808
16: 1991-04-01 PC1  0.0360555173
17: 1991-05-01 PC1  0.0474946944
18: 1991-06-01 PC1  0.0571730689
19: 1991-07-01 PC1  0.0617849450
20: 1991-08-01 PC1  0.0613448909
21: 1991-09-01 PC1  0.0914157721
22: 1991-10-01 PC1  0.0711386221
23: 1991-11-01 PC1  0.0736501548
24: 1991-12-01 PC1  0.2587426737
25: 1992-01-01 PC1  0.0078855522
26: 1992-02-01 PC1  0.2166346442
27: 1992-03-01 PC1  0.1246286704
28: 1992-04-01 PC1  0.0062662242
29: 1992-05-01 PC1  0.2145481541
30: 1992-06-01 PC1  0.1689961784
31: 1992-07-01 PC1 -0.0354560166
32: 1992-08-01 PC1  0.0881697630
33: 1992-09-01 PC1  0.0173748000
34: 1992-10-01 PC1 -0.0001819942
35: 1992-11-01 PC1 -0.0400375449
36: 1992-12-01 PC1 -0.0022350454
37: 1993-01-01 PC1  0.2240684322
38: 1993-02-01 PC1 -0.0368883516
39: 1993-03-01 PC1  0.0399926617
40: 1993-04-01 PC1 -0.0540678881
41: 1993-05-01 PC1 -0.1809682724
42: 1993-06-01 PC1 -0.2269588897
43: 1993-07-01 PC1 -0.3326995460
44: 1993-08-01 PC1 -0.1041614394
45: 1993-09-01 PC1 -0.0637642919
46: 1993-10-01 PC1 -0.1036064956
47: 1993-11-01 PC1 -0.0681045573
48: 1993-12-01 PC1 -0.0875282521
49: 1994-01-01 PC1 -0.0773189675
50: 1994-02-01 PC1 -0.1181704766
51: 1994-03-01 PC1 -0.0814743845
52: 1994-04-01 PC1 -0.0541017867
53: 1994-05-01 PC1 -0.0197324100
54: 1994-06-01 PC1  0.0845209145
55: 1994-07-01 PC1 -0.0346162820
56: 1994-08-01 PC1 -0.1703960070
57: 1994-09-01 PC1  0.1732601374
58: 1994-10-01 PC1  0.0718483282
59: 1994-11-01 PC1  0.0392072510
60: 1994-12-01 PC1 -0.0857888775
61: 1995-01-01 PC1 -0.1497757028
62: 1995-02-01 PC1 -0.0514276652
63: 1995-03-01 PC1  0.0089226026
64: 1995-04-01 PC1 -0.1267875207
65: 1995-05-01 PC1 -0.2151977147
66: 1995-06-01 PC1  0.0056217357
67: 1995-07-01 PC1  0.3004280059
68: 1995-08-01 PC1  0.1158326174
69: 1995-09-01 PC1 -0.0886654194
70: 1995-10-01 PC1 -0.0184193788
71: 1995-11-01 PC1 -0.0331399869
72: 1995-12-01 PC1 -0.1462372188
          date  PC        gh.t.w

sdev:
    PC       sd        r2      low      hig
1: PC1 7050.352 0.3176266 6338.487 8002.174
# AAO field
library(ggplot2)
ggplot(aao$left, aes(lon, lat, z = gh.t.w)) +
    geom_contour(aes(color = after_stat(level))) +
    coord_polar()

El peso de cada mes observado dentro de la composición de esta primera EOF es:

# AAO signal
ggplot(aao$right, aes(date, gh.t.w)) +
    geom_line()

# standard deviation, % of explained variance and
# confidence intervals.
aao$sdev
    PC       sd        r2      low      hig
1: PC1 7050.352 0.3176266 6338.487 8002.174

Usamos las dos primeras EOF para predecir los valores de gh.t.w:

# Reconstructed fields based only on the two first
# principal components
field <- predict(eof, 1:2)

Representamos la predicción (en color) para el mes 1 frente a los datos de anomalía (en lineas de nivel) medidos realmente ese mes:

# Compare it to the real field.
ggplot(field[date == date[1]], aes(lon, lat)) +
    geom_contour_fill(aes(z = gh.t.w), data = geopotential[date == date[1]]) +
    geom_contour2(aes(z = gh.t.w, linetype = factor(-sign(stat(level))))) +
    scale_fill_divergent()
Warning: `stat(level)` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(level)` instead.
Warning: The following aesthetics were dropped during statistical transformation: z
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation: z
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

 

 

Otro ejemplo: Detecting the Dominant patterns of Sea Surface Temperature Anomalies in the Bay of Bengal using the empirical orthogonal functions (EOF)

Ver https://rpubs.com/Shanto_khandoker/662487