Mostrar/Ocultar código R
library (readxl)
library (INLA)
library (quantreg)
library (lme4)
library (tidyverse)
library (flextable)
library (plot3D)
library (epibasix)
library (janitor)
library (lubridate)
library (knitr)
library (kableExtra)
inla.setOption (short.summary= TRUE )
Course materials
The data files and R code used in this course can be downloaded from this link .
A simple linear regression model
For a dataset \({(x_i,y_i) : i = 1,...,n}\) , we consider the linear regression model: \(y_{i}\sim N\left(\mu_{i},\sigma\right)\) , being
\[\mu_i = \beta_0 + \beta_1.x_i\] We estimate this model in the bayesian framework supposing that \(\beta_{0}\sim N\left(0;10^{4}\right)\) and \(\beta_{1}\sim N\left(3,1\right)\)
First, we simulate a dataset \(\left\{ \left(x_{i},y_{i}\right)~:~i=1,...,n=50\right\}\) obeying the linear regression model: \[y_{i}\sim N\left(5+x_{i}~;~\sigma=1\right)\] Note \(\beta_0=5\) and \(\beta_1=1\) .
Data simulation:
Mostrar/Ocultar código R
### Data simulation
set.seed (19941998 )
b <- c (5 ,1 )
sg= 1
n= 50
x <- seq (0 ,10 ,length= 50 )
mu <- b[1 ]+ b[2 ]* x
y <- mu+ rnorm (n,0 ,sg)
dt <- data.frame (x,y)
### Figure
ggplot (dt,aes (x,y)) + geom_point ()
Results of frequentist inference are:
Mostrar/Ocultar código R
### Frequentist Inference
ml <- lm (y ~ x)
summary (ml)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-2.3367 -0.6464 -0.1080 0.6322 2.4538
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.66664 0.28812 16.20 <2e-16 ***
x 1.06518 0.04965 21.45 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.034 on 48 degrees of freedom
Multiple R-squared: 0.9056, Adjusted R-squared: 0.9036
F-statistic: 460.3 on 1 and 48 DF, p-value: < 2.2e-16
And results of bayesian inference with vague priors:
Mostrar/Ocultar código R
#### Bayesian inference
formula <- y ~ 1 + x
ml <- inla (formula,family= "gaussian" ,data= dt)
ml$ summary.fixed[,1 : 5 ]
mean sd 0.025quant 0.5quant 0.975quant
(Intercept) 4.666657 0.28747239 4.1006151 4.666656 5.232700
x 1.065182 0.04953955 0.9676367 1.065182 1.162727
Erratic prior.
If we impose on the Bayesian estimation that we are very sure that the value of the slope is 1.5, we arrive at the following result:
Mostrar/Ocultar código R
#### Erratic priors
mb <- inla (formula,family= "gaussian" ,data= dt,
control.fixed= list (mean= 1.5 , prec= 10000 ,
mean.intercept= 1 , prec.intercept= 0.0001 ))
mb$ summary.fixed[,1 : 5 ]
mean sd 0.025quant 0.5quant 0.975quant
(Intercept) 2.528839 0.2358034 2.062863 2.529498 2.991121
x 1.492742 0.0100154 1.473102 1.492742 1.512385
Next we show this line as well as the true regression line and the one fitted by frequentist estimation. We can see how our initial prior has had a great effect.
Mostrar/Ocultar código R
### Figure
Bb_0 <- mb$ summary.fixed[1 ,1 ]
Bb_1 <- mb$ summary.fixed[2 ,1 ]
ggplot (dt,aes (x,y))+
geom_point (colour= "#99CCFF" )+
geom_abline (intercept = 5 , slope = 1 ,colour= "#3300FF" ,linewidth= 1.5 )+
annotate (geom= "text" , x= 2.5 , y= 16 , label= "True regression line" ,colour= "#3300FF" ,size= 5 )+
stat_smooth (method = "lm" ,formula = y ~ x,geom = "smooth" ,colour= "#FF0000" )+
annotate (geom= "text" , x= 2.5 , y= 15 , label= "Frequentist estimation" ,colour= "#FF0000" ,size= 5 )+
geom_abline (intercept = Bb_0, slope = Bb_1,colour= "#33CC99" )+
annotate (geom= "text" , x= 2.5 , y= 14 , label= "Bayesian estimation" ,colour= "#33CC99" ,size= 5 )
We can also see the posterior density resulting from our insistence that the slope has to be very close to 1.5:
Mostrar/Ocultar código R
#### Densidades a posteriori
dpost.intercept <- data.frame (mb$ marginals.fixed$ ` (Intercept) ` )
dpost.x <- data.frame (mb$ marginals.fixed$ x)
ggplot (dpost.x,aes (x,y))+
geom_line (colour= "#33CC99" ,linewidth= 1.5 )+
geom_vline (xintercept= 1 ,colour= "#3300FF" )
A closer look at the posterior density:
Mostrar/Ocultar código R
ggplot (dpost.x,aes (x,y))+
geom_line (colour= "blue" ,size= 1 )+
theme_bw ()