Angelo Santana
Curso 2017/18
Datos y problemas
A continuación mostramos algunas bases de datos que utilizaremos en el aprendizaje de esta asignatura.
season | size | speed | mxPH | mnO2 | Cl | NO3 | NH4 | oPO4 | PO4 |
---|---|---|---|---|---|---|---|---|---|
winter | small | medium | 8.00 | 9.8 | 60.800 | 6.238 | 578.000 | 105.000 | 170.000 |
spring | small | medium | 8.35 | 8.0 | 57.750 | 1.288 | 370.000 | 428.750 | 558.750 |
autumn | small | medium | 8.10 | 11.4 | 40.020 | 5.330 | 346.667 | 125.667 | 187.057 |
spring | small | medium | 8.07 | 4.8 | 77.364 | 2.302 | 98.182 | 61.182 | 138.700 |
autumn | small | medium | 8.06 | 9.0 | 55.350 | 10.416 | 233.700 | 58.222 | 97.580 |
winter | small | high | 8.25 | 13.1 | 65.750 | 9.248 | 430.000 | 18.250 | 56.667 |
Chla | a1 | a2 | a3 | a4 | a5 | a6 | a7 |
---|---|---|---|---|---|---|---|
50.0 | 0.0 | 0.0 | 0.0 | 0.0 | 34.2 | 8.3 | 0.0 |
1.3 | 1.4 | 7.6 | 4.8 | 1.9 | 6.7 | 0.0 | 2.1 |
15.6 | 3.3 | 53.6 | 1.9 | 0.0 | 0.0 | 0.0 | 9.7 |
1.4 | 3.1 | 41.0 | 18.9 | 0.0 | 1.4 | 0.0 | 1.4 |
10.5 | 9.2 | 2.9 | 7.5 | 0.0 | 7.5 | 4.1 | 1.0 |
28.4 | 15.1 | 14.6 | 1.4 | 0.0 | 22.5 | 12.6 | 2.9 |
año | isla | pais | pasajeros | mes1 | mes2 | mes3 | mes4 | mes5 |
---|---|---|---|---|---|---|---|---|
2013 | Gran Canaria | ESPAÑA | 1794813 | 128762 | 123922 | 147854 | 141651 | 148617 |
2013 | Gran Canaria | ALEMANIA | 796817 | 70458 | 67044 | 73036 | 50662 | 45841 |
2013 | Gran Canaria | REINO UNIDO | 536281 | 28178 | 29397 | 36905 | 41925 | 46962 |
2013 | Gran Canaria | NORUEGA | 367724 | 51240 | 50417 | 51529 | 17914 | 10124 |
2013 | Gran Canaria | SUECIA | 309656 | 43092 | 40538 | 44991 | 16252 | 6293 |
2013 | Gran Canaria | HOLANDA | 175844 | 14370 | 14716 | 15915 | 12876 | 12056 |
mes6 | mes7 | mes8 | mes9 | mes10 | mes11 | mes12 |
---|---|---|---|---|---|---|
154646 | 170756 | 171137 | 155569 | 159404 | 138835 | 153660 |
46550 | 53562 | 52198 | 65061 | 70748 | 99650 | 102007 |
53874 | 57585 | 57266 | 57284 | 51992 | 36317 | 38596 |
12354 | 18708 | 10887 | 13432 | 33837 | 52761 | 44521 |
5821 | 7020 | 5776 | 6502 | 29099 | 54064 | 50208 |
10523 | 19087 | 16906 | 12720 | 16763 | 14036 | 15876 |
TIMESTAMP | RECORD | Temp_degC | EC_uScm | pH | DO_Sat | DO_mg | Turbidity_NTU | Chloraphylla_ugL |
---|---|---|---|---|---|---|---|---|
2015-01-01 00:03:30 | 32933 | 30.46 | 43510 | 7.73 | 88.5 | 5.70 | 3.7 | 4.7 |
2015-01-01 00:33:30 | 32934 | 30.43 | 43530 | 7.75 | 89.6 | 5.77 | 1.8 | 6.0 |
2015-01-01 01:03:30 | 32935 | 30.41 | 43510 | 7.76 | 90.0 | 5.80 | 2.0 | 5.1 |
2015-01-01 01:33:30 | 32936 | 30.36 | 43620 | 7.76 | 89.4 | 5.76 | 2.1 | 4.8 |
2015-01-01 02:03:30 | 32937 | 30.32 | 43680 | 7.75 | 88.6 | 5.71 | 3.4 | 5.8 |
2015-01-01 02:33:30 | 32938 | 30.26 | 43840 | 7.75 | 86.2 | 5.56 | 2.6 | 6.0 |
Descripción de datos: tablas y gráficos.
media | sd | asimetria | kurtosis | |
---|---|---|---|---|
PH | 8.012 | 0.5983 | -0.7359 | 2.139 |
Oxígeno | 9.118 | 2.391 | -0.9019 | 0.2816 |
Cloruro | 43.64 | 46.83 | 3.111 | 16.69 |
Nitratos | 3.282 | 3.776 | 7.396 | 80.29 |
minimo | maximo | ptil.25% | ptil.50% | ptil.75% | |
---|---|---|---|---|---|
PH | 5.6 | 9.7 | 7.7 | 8.06 | 8.4 |
Oxígeno | 1.5 | 13.4 | 7.725 | 9.8 | 10.8 |
Cloruro | 0.222 | 391.5 | 10.98 | 32.73 | 57.82 |
Nitratos | 0.05 | 45.65 | 1.296 | 2.675 | 4.446 |
media | sd | asimetria | kurtosis | |
---|---|---|---|---|
especie.1 | 16.92 | 21.35 | 1.482 | 1.421 |
especie.2 | 7.458 | 11.03 | 2.432 | 7.861 |
especie.3 | 4.309 | 6.949 | 2.501 | 7.222 |
especie.4 | 1.992 | 4.417 | 6.04 | 49.65 |
minimo | maximo | ptil.25% | ptil.50% | ptil.75% | |
---|---|---|---|---|---|
especie.1 | 0 | 89.8 | 1.5 | 6.95 | 24.8 |
especie.2 | 0 | 72.6 | 0 | 3 | 11.38 |
especie.3 | 0 | 42.8 | 0 | 1.55 | 4.925 |
especie.4 | 0 | 44.6 | 0 | 0 | 2.4 |
isla | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 |
---|---|---|---|---|---|---|---|
El Hierro | 57321 | 60324 | 66487 | 64122 | 65674 | 70414 | 77326 |
Fuerteventura | 286077 | 385448 | 418234 | 442381 | 505305 | 597724 | 660237 |
Gran Canaria | 1359710 | 1443685 | 1492454 | 1499380 | 1608642 | 1832489 | 2055642 |
La Gomera | NA | 7185 | 11235 | 11503 | 13557 | 14826 | 16517 |
Lanzarote | 591143 | 646347 | 667271 | 682172 | 741387 | 873822 | 952835 |
La Palma | 276664 | 293283 | 311779 | 299174 | 324072 | 368248 | 419602 |
Tenerife | 1761802 | 1882674 | 1914597 | 1971438 | 2197777 | 2395184 | 2571055 |
2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 |
---|---|---|---|---|---|---|---|
84285 | 91454 | 96241 | 90640 | 84393 | 83648 | 75666 | 69186 |
728347 | 769007 | 727171 | 628832 | 629384 | 649872 | 520532 | 459735 |
2203970 | 2297418 | 2303023 | 2105051 | 2156604 | 2269553 | 1987626 | 1794813 |
18541 | 19606 | 20193 | 16912 | 15697 | 15879 | 9425 | 11889 |
1044279 | 1118302 | 1056623 | 913372 | 949493 | 1007380 | 840194 | 768309 |
442580 | 460083 | 434431 | 399569 | 378151 | 414009 | 348812 | 288240 |
2676839 | 2794818 | 2768238 | 2521160 | 2500251 | 2518575 | 2253928 | 2066200 |
media | sd | asimetria | kurtosis | |
---|---|---|---|---|
Temp_degC | 24.24 | 3.705 | -0.000666 | -1.238 |
DO_Sat | 92.42 | 12.21 | 2.084 | 7.925 |
DO_mg | 6.838 | 0.8902 | 1.38 | 5.939 |
Clorofila | 6.349 | 8.048 | 17.87 | 581.7 |
minimo | maximo | ptil.25% | ptil.50% | ptil.75% | |
---|---|---|---|---|---|
Temp_degC | 16.07 | 32.71 | 20.84 | 24.37 | 27.56 |
DO_Sat | 47.3 | 178.6 | 85.7 | 89.9 | 96.3 |
DO_mg | 3.55 | 12.96 | 6.3 | 6.81 | 7.25 |
Clorofila | -1.2 | 341.5 | 2.4 | 5 | 8 |
Revisión de métodos básicos de inferencia estadística: estimación y contraste de hipótesis.
Temp.Media.dia | Temp.media.noche |
---|---|
24.5 | 23.93 |
Temp.Media.Dia | Temp.media.Noche |
---|---|
[24.42,24.58] | [23.85,24.02] |
La construcción de los intervalos de confianza para la media requiere:
¿El pH en los ríos europeos es en promedio mayor en invierno que en verano?
season | mxPH |
---|---|
winter | 8.119 |
summer | 7.905 |
¿Es significativa esta diferencia?
¿El pH en los ríos europeos es en promedio mayor en invierno que en verano?
season | mxPH |
---|---|
winter | 8.119 |
summer | 7.905 |
¿Es significativa esta diferencia?
##
## Welch Two Sample t-test
##
## data: mxPH by season
## t = 2.0681, df = 96.051, p-value = 0.02066
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.04212935 Inf
## sample estimates:
## mean in group winter mean in group summer
## 8.119180 7.905222
##
## Wilcoxon rank sum test with continuity correction
##
## data: mxPH by season
## W = 1646, p-value = 0.04035
## alternative hypothesis: true location shift is greater than 0
¿El oxígeno disuelto presenta menor concentración en invierno que en verano?
season | mnO2 |
---|---|
winter | 8.88 |
summer | 9.415 |
¿Es significativa esta diferencia?
¿El oxígeno disuelto presenta menor concentración en invierno que en verano?
season | mnO2 |
---|---|
winter | 8.88 |
summer | 9.415 |
¿Es significativa esta diferencia?
##
## Welch Two Sample t-test
##
## data: mnO2 by season
## t = -1.2242, df = 101.29, p-value = 0.1119
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 0.1904291
## sample estimates:
## mean in group winter mean in group summer
## 8.879839 9.414773
##
## Wilcoxon rank sum test with continuity correction
##
## data: mnO2 by season
## W = 1163, p-value = 0.09925
## alternative hypothesis: true location shift is less than 0
¿Qué ocurre si con los mismos datos cambiamos la pregunta?
¿Difiere el promedio de pH entre el invierno y el verano?
##
## Welch Two Sample t-test
##
## data: mxPH by season
## t = 2.0681, df = 96.051, p-value = 0.04132
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.008599169 0.419317042
## sample estimates:
## mean in group winter mean in group summer
## 8.119180 7.905222
##
## Wilcoxon rank sum test with continuity correction
##
## data: mxPH by season
## W = 1646, p-value = 0.08071
## alternative hypothesis: true location shift is not equal to 0
Tema 2. Modelos lineales
media | sd | minimo | maximo | ptil.25% | ptil.50% | ptil.75% | |
---|---|---|---|---|---|---|---|
winter | 8.119 | 0.5328 | 6.6 | 9.7 | 7.8 | 8.1 | 8.43 |
spring | 8.024 | 0.6824 | 5.6 | 9.5 | 7.79 | 8.07 | 8.4 |
summer | 7.905 | 0.5218 | 6.4 | 8.8 | 7.6 | 8 | 8.2 |
autumn | 7.952 | 0.6462 | 5.7 | 8.87 | 7.587 | 8.06 | 8.4 |
## Df Sum Sq Mean Sq F value Pr(>F)
## season 3 1.37 0.4559 1.279 0.283
## Residuals 195 69.51 0.3565
## 1 observation deleted due to missingness
##
## Kruskal-Wallis rank sum test
##
## data: mxPH by season
## Kruskal-Wallis chi-squared = 2.9487, df = 3, p-value = 0.3996
media | sd | minimo | maximo | ptil.25% | ptil.50% | ptil.75% | |
---|---|---|---|---|---|---|---|
small | 7.657 | 0.6313 | 5.6 | 8.7 | 7.41 | 7.795 | 8.068 |
medium | 8.101 | 0.4845 | 7.3 | 9.7 | 7.8 | 8.1 | 8.408 |
large | 8.396 | 0.4198 | 7.3 | 9.5 | 8.2 | 8.4 | 8.6 |
## Df Sum Sq Mean Sq F value Pr(>F)
## size 2 11501 5751 14.3 1.58e-06 ***
## Residuals 197 79194 402
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Kruskal-Wallis rank sum test
##
## data: a1 by size
## Kruskal-Wallis chi-squared = 24.565, df = 2, p-value = 4.632e-06
correlacion | ordenada | pendiente | |
---|---|---|---|
mxPH | -0.2651 | 92.67 | -9.466 |
mnO2 | 0.2874 | -6.44 | 2.528 |
Cl | -0.3712 | 23.09 | -0.1635 |
##
## Call:
## lm(formula = a1 ~ mxPH + mnO2 + Cl + NO3 + NH4 + PO4 + Chla,
## data = algae2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.467 -11.805 -2.308 6.493 68.683
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 55.0491165 25.9525262 2.121 0.035310 *
## mxPH -4.0245404 3.1501630 -1.278 0.203085
## mnO2 0.8994945 0.6356089 1.415 0.158786
## Cl -0.0469367 0.0321062 -1.462 0.145546
## NO3 -1.6925399 0.5361443 -3.157 0.001877 **
## NH4 0.0018216 0.0009671 1.884 0.061273 .
## PO4 -0.0495251 0.0124529 -3.977 0.000102 ***
## Chla -0.0878041 0.0737148 -1.191 0.235206
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.26 on 176 degrees of freedom
## Multiple R-squared: 0.3057, Adjusted R-squared: 0.2781
## F-statistic: 11.07 on 7 and 176 DF, p-value: 1.443e-11
Valores predichos por el modelo frente a valores observados:
##
## Call:
## lm(formula = a1 ~ season + size + mxPH + mnO2 + Cl + NO3 + NH4 +
## PO4 + Chla, data = algae2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38.017 -11.482 -3.023 6.195 66.930
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43.306328 27.406865 1.580 0.11593
## seasonspring -3.075604 3.419370 -0.899 0.36967
## seasonsummer -2.913582 3.564185 -0.817 0.41480
## seasonautumn -1.871303 3.893744 -0.481 0.63142
## sizemedium -5.671874 3.215836 -1.764 0.07956 .
## sizelarge -8.496974 3.974718 -2.138 0.03396 *
## mxPH -1.671419 3.387717 -0.493 0.62238
## mnO2 0.736770 0.670315 1.099 0.27325
## Cl -0.038344 0.032260 -1.189 0.23625
## NO3 -1.574966 0.547239 -2.878 0.00451 **
## NH4 0.001647 0.000975 1.689 0.09297 .
## PO4 -0.051695 0.012509 -4.133 5.6e-05 ***
## Chla -0.075058 0.073655 -1.019 0.30963
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.17 on 171 degrees of freedom
## Multiple R-squared: 0.3327, Adjusted R-squared: 0.2859
## F-statistic: 7.106 on 12 and 171 DF, p-value: 1.974e-10
Valores predichos por el modelo frente a valores observados:
Número de observaciones en que ocurre un bloom de algas de la especie 1 (abundancia>60)
##
## No bloom bloom
## 188 12
##
## Call:
## glm(formula = bloom1 ~ size + NO3 + NH4 + PO4, family = binomial,
## data = algae2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8587 -0.3342 -0.1015 -0.0001 2.5432
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.604e-01 5.657e-01 -1.167 0.243
## sizemedium -2.770e-01 8.372e-01 -0.331 0.741
## sizelarge -1.719e+01 1.498e+03 -0.011 0.991
## NO3 -1.640e-01 2.328e-01 -0.704 0.481
## NH4 -3.197e-03 5.326e-03 -0.600 0.548
## PO4 -1.298e-02 8.721e-03 -1.488 0.137
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 90.413 on 196 degrees of freedom
## Residual deviance: 66.062 on 191 degrees of freedom
## AIC: 78.062
##
## Number of Fisher Scoring iterations: 18