Vencedoras: Manuela Sales Lima Nascimento (UFRM); Larissa Ávila Matos (UNICAMP); Raquel Aparecida Moreira (USP); Marcele Fonseca Passos (UFPA); Luisa Campos Caldeira Brant (UFMG); Carolina Loureiro Benone (UFPA); Fernanda Rodrigues Soares (UFTM).
13-06-2025
Vencedoras: Manuela Sales Lima Nascimento (UFRM); Larissa Ávila Matos (UNICAMP); Raquel Aparecida Moreira (USP); Marcele Fonseca Passos (UFPA); Luisa Campos Caldeira Brant (UFMG); Carolina Loureiro Benone (UFPA); Fernanda Rodrigues Soares (UFTM).
Introduction to linear mixed models
The R package skewlmm
Challenge 1: within subject serial correlation
Challenge 2: outliers
Challenge 3: skewness
Challenge 4: censored/missing data
Goal: Evaluate the impact of different high-fat diets on mice, two kinds of supplementation were considered, in addition to a standard high-fat diet (control).
Available in the R package skewlmm.
The weight of 52 mice was evaluated at baseline (week 0), and at weeks 1,2,…,10 of treatment.
data(miceweight) head(miceweight, 4)
## treat mouseID week weight ## 1 C 101 0 33.8 ## 2 C 101 1 37.2 ## 3 C 101 2 34.8 ## 4 C 101 3 38.3
help(miceweight)
Linear mixed-effects models (LMMs) are an important class of statistical models that can be used to analyze correlated data. They are particularly useful for handling:
repeated measures data;
clustered data;
longitudinal studies.
Such data are encountered in a variety of fields including biostatistics, public health, psychometrics, educational measurement, sociology and geophysics.
LMMs were introduced by Laird & Ware (1982)
The linear mixed-effects model is defined as
\[ \mathbf{Y}_{i}= \mathbf{X}_i \boldsymbol{\beta} + \mathbf{Z}_i\mathbf{b}_i + \boldsymbol{\epsilon}_{i}, \quad i = 1, \ldots, n, \]
where
\(\mathbf{Y}_i\) : (\(n_i\times 1\)) vector of observations for subject \(i\),
\(\boldsymbol\beta\) : (\(p \times 1\)) vector of fixed effects regression coefficients,
\(\mathbf{X}_i\) : (\(n_i\times p\)) covariate (design) matrix for fixed effects,
\(\mathbf{b}_i\) : (\(q \times 1\)) vector of random effects for subject \(i\),
\(\mathbf{Z}_i\) : (\(n_i\times q\)) covariate (design) matrix for random effects,
\(\boldsymbol{\epsilon}_i\) : (\(n_i\times 1\)) vector of (possibly correlated) error terms for subject \(i\).
Remark:
Fixed effects refer to average effect in population;
Random effects are specific to each subject.
Most commonly, in longitudinal studies we consider random intercepts (\(b_{0i}\)) and random slopes on time (\(b_{1i}\)):
Random intercepts → individuals randomly distributed about population mean intercept,
Random slope → individuals have rates of change randomly distributed about population mean.
In the model, \[ \mathbf{Y}_{i}= \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i \mathbf{b}_i + \boldsymbol{\epsilon}_{i}, \]
we typically assume that the random effects are independent of the error terms and that
\[ \mathbf{b}_i \sim N(\mathbf{0}, \mathbf{D}), \] \[ \boldsymbol\epsilon_i \sim N(\mathbf{0}, \boldsymbol{\Omega}_i). \]
This assumption yields a covariance structure for the outcomes
\[ \text{Var}(\mathbf{Y}_i) = \boldsymbol{\Sigma}_i= \mathbf{Z}_i \mathbf{DZ}_i^\top + \boldsymbol{\Omega}_i. \]
Remark:
The random effect structure implies correlation between the outcomes!
Our model is
\[ \mathbf{Y}_{i}= \mathbf{X}_i\boldsymbol{\beta} + \mathbf{Z}_i \mathbf{b}_i + \boldsymbol{\epsilon}_{i}. \]
If we fit a model with only random intercepts, we have
\[ \mathbf{Y}_{i}= \mathbf{X}_i\boldsymbol\beta + b_{0i} + \boldsymbol\epsilon_{i}. \]
We assume that \(b_{0i} \sim N(0, \sigma^2_b)\) and \(\boldsymbol\epsilon_i \sim N(\mathbf{0}, \sigma^2_e\mathbf{I})\), where
\(\sigma^2_e\) is the amount of random variability (of the residuals),
\(\sigma^2_b\) is the amount of variability in the population around the population average line.
Two main popular functions to fit LMMs in R are:
lme()
in the nlme package: supports several random effects and error level dependence structures;
lmer()
in the lme4 package: has more efficient linear algebra tools and is more efficient for fitting models with crossed random effects, but does not support special dependence structures.
Remark:
Both these functions assume normal distributions for the random terms
(although the function glmer()
in the lme4 package fits generalized linear mixed effect models).
Fitting the model \[ \begin{aligned} \mathbf{y}_{i} = & (b_{0i} +\beta_0+ \beta_1\,\text{treat}_{1i}+ \beta_2\,\text{treat}_{2i})\mathbf{1}_{n_i} \\ & + (b_{1i}+\beta_3+\beta_4\,\text{treat}_{1i}+\beta_5\,\text{treat}_{2i}) {\bf t}_i +\mathbb{\epsilon}_{i} \end{aligned} \] where \(y_{i}\) denotes the weight in grams (g) and \(t_{ij}\) denotes the numbers of weeks of diet, for \(t_{i1}=5, t_{i2}=6, \ldots, t_{i6}=10\), \(i = 1, \ldots, 52\). Also, \(t_{ij}\) is (week - 7.5).
Obs.: Time variable was centered in 0 and the first four weeks were excluded as no treatment was received during that time.
# Using lme4 flme4 <- lmer(formula = weight ~ treat*weekt + (weekt|mouseID), data = miceweight) summary(flme4)
## Linear mixed model fit by REML ['lmerMod'] ## Formula: weight ~ treat * weekt + (weekt | mouseID) ## Data: miceweight ## ## REML criterion at convergence: 1721.4 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.2513 -0.4116 0.0017 0.3597 3.9826 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## mouseID (Intercept) 24.4380 4.9435 ## weekt 0.9716 0.9857 0.29 ## Residual 7.5729 2.7519 ## Number of obs: 312, groups: mouseID, 52 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 43.9939 1.5285 28.782 ## treatT1 0.9156 1.8868 0.485 ## treatT2 -0.9389 1.9030 -0.493 ## weekt 1.7590 0.3573 4.923 ## treatT1:weekt -0.1704 0.4411 -0.386 ## treatT2:weekt -0.5881 0.4448 -1.322 ## ## Correlation of Fixed Effects: ## (Intr) tretT1 tretT2 weekt trtT1: ## treatT1 -0.810 ## treatT2 -0.803 0.651 ## weekt 0.232 -0.188 -0.186 ## treatT1:wkt -0.188 0.232 0.151 -0.810 ## treatT2:wkt -0.186 0.151 0.232 -0.803 0.651
# Using nlme fnlme <- lme(weight ~ treat*weekt, data = miceweight, random = ~ weekt|mouseID) summary(fnlme)
## Linear mixed-effects model fit by REML ## Data: miceweight ## AIC BIC logLik ## 1741.393 1778.629 -860.6965 ## ## Random effects: ## Formula: ~weekt | mouseID ## Structure: General positive-definite, Log-Cholesky parametrization ## StdDev Corr ## (Intercept) 4.9434538 (Intr) ## weekt 0.9857314 0.286 ## Residual 2.7518810 ## ## Fixed effects: weight ~ treat * weekt ## Value Std.Error DF t-value p-value ## (Intercept) 43.99394 1.5285131 257 28.782179 0.0000 ## treatT1 0.91558 1.8868376 49 0.485248 0.6297 ## treatT2 -0.93894 1.9029835 49 -0.493404 0.6239 ## weekt 1.75896 0.3573133 257 4.922742 0.0000 ## treatT1:weekt -0.17039 0.4410771 257 -0.386303 0.6996 ## treatT2:weekt -0.58810 0.4448514 257 -1.322023 0.1873 ## Correlation: ## (Intr) tretT1 tretT2 weekt trtT1: ## treatT1 -0.810 ## treatT2 -0.803 0.651 ## weekt 0.232 -0.188 -0.186 ## treatT1:weekt -0.188 0.232 0.151 -0.810 ## treatT2:weekt -0.186 0.151 0.232 -0.803 0.651 ## ## Standardized Within-Group Residuals: ## Min Q1 Med Q3 Max ## -3.251342407 -0.411642381 0.001663696 0.359700449 3.982562482 ## ## Number of Observations: 312 ## Number of Groups: 52
randomEffects = ranef(fnlme)
Remark:
It is assumed \(\mathbf{b}_i \sim N_2(\mathbf{0}, \mathbf{D})\).
The skewlmm package
remotes::install_github("fernandalschumacher/skewlmm") #or install.packages("skewlmm")
Its three main functions are smn.lmm, smsn.lmm, and smn.clmm:
library(skewlmm) smn.lmm(data, formFixed, groupVar, ...) smsn.lmm(data, formFixed, groupVar, ...) smn.clmm(data, formFixed, groupVar, ci, lcl, ucl, ...)
fskewlmm <- smn.lmm(formFixed = weight ~ treat*weekt, formRandom = ~ weekt, groupVar = "mouseID", data = miceweight) summary(fskewlmm)
## Linear mixed models with distribution norm and dependency structure UNC ## Call: ## smn.lmm(data = miceweight, formFixed = weight ~ treat * weekt, ## groupVar = "mouseID", formRandom = ~weekt) ## ## Distribution norm ## Random effects: ## Formula: ~weekt ## Structure: ## Estimated variance (D): ## (Intercept) weekt ## (Intercept) 22.952931 1.3137366 ## weekt 1.313737 0.8853784 ## ## Fixed effects: weight ~ treat * weekt ## with approximate confidence intervals ## Value Std.error CI 95% lower CI 95% upper ## (Intercept) 43.9939394 1.6480925 40.763737 47.2241414 ## treatT1 0.9155844 1.9111436 -2.830188 4.6613571 ## treatT2 -0.9389394 1.9806544 -4.820951 2.9430719 ## weekt 1.7589610 0.3746267 1.024706 2.4932159 ## treatT1:weekt -0.1703896 0.4492493 -1.050902 0.7101229 ## treatT2:weekt -0.5881039 0.4344989 -1.439706 0.2634984 ## ## Dependency structure: UNC ## Estimate(s): ## sigma2 ## 7.579361 ## ## Model selection criteria: ## logLik AIC BIC ## -862.978 1745.957 1783.387 ## ## Number of observations: 312 ## Number of groups: 52
fskewlmm
## Linear mixed models with distribution norm and dependency structure UNC ## Call: ## smn.lmm(data = miceweight, formFixed = weight ~ treat * weekt, ## groupVar = "mouseID", formRandom = ~weekt) ## ## Fixed: weight ~ treat * weekt ## Random: ## Formula: ~weekt ## Structure: General positive-definite ## Estimated variance (D): ## (Intercept) weekt ## (Intercept) 22.952931 1.3137366 ## weekt 1.313737 0.8853784 ## ## Estimated parameters: ## (Intercept) treatT1 treatT2 weekt treatT1:weekt treatT2:weekt sigma2 ## 43.9939 0.9156 -0.9389 1.7590 -0.1704 -0.5881 7.5794 ## s.e. 1.6481 1.9111 1.9807 0.3746 0.4492 0.4345 0.5441 ## Dsqrt11 Dsqrt12 Dsqrt22 ## 4.7854 0.2306 0.9123 ## s.e. 0.7351 0.1937 0.2028 ## ## Model selection criteria: ## logLik AIC BIC ## -862.978 1745.957 1783.387 ## ## Number of observations: 312 ## Number of groups: 52
smn.lmm
vs lmer
# Using smn.lmm summary(fskewlmm)
## Linear mixed models with distribution norm and dependency structure UNC ## Call: ## smn.lmm(data = miceweight, formFixed = weight ~ treat * weekt, ## groupVar = "mouseID", formRandom = ~weekt) ## ## Distribution norm ## Random effects: ## Formula: ~weekt ## Structure: ## Estimated variance (D): ## (Intercept) weekt ## (Intercept) 22.952931 1.3137366 ## weekt 1.313737 0.8853784 ## ## Fixed effects: weight ~ treat * weekt ## with approximate confidence intervals ## Value Std.error CI 95% lower CI 95% upper ## (Intercept) 43.9939394 1.6480925 40.763737 47.2241414 ## treatT1 0.9155844 1.9111436 -2.830188 4.6613571 ## treatT2 -0.9389394 1.9806544 -4.820951 2.9430719 ## weekt 1.7589610 0.3746267 1.024706 2.4932159 ## treatT1:weekt -0.1703896 0.4492493 -1.050902 0.7101229 ## treatT2:weekt -0.5881039 0.4344989 -1.439706 0.2634984 ## ## Dependency structure: UNC ## Estimate(s): ## sigma2 ## 7.579361 ## ## Model selection criteria: ## logLik AIC BIC ## -862.978 1745.957 1783.387 ## ## Number of observations: 312 ## Number of groups: 52
# Using lme4 summary(update(flme4, REML = FALSE))
## Linear mixed model fit by maximum likelihood ['lmerMod'] ## Formula: weight ~ treat * weekt + (weekt | mouseID) ## Data: miceweight ## ## AIC BIC logLik deviance df.resid ## 1746.0 1783.4 -863.0 1726.0 302 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.2724 -0.4092 0.0037 0.3693 4.0106 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## mouseID (Intercept) 22.9551 4.7911 ## weekt 0.8906 0.9437 0.29 ## Residual 7.5729 2.7519 ## Number of obs: 312, groups: mouseID, 52 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 43.9939 1.4838 29.650 ## treatT1 0.9156 1.8316 0.500 ## treatT2 -0.9389 1.8473 -0.508 ## weekt 1.7590 0.3469 5.071 ## treatT1:weekt -0.1704 0.4282 -0.398 ## treatT2:weekt -0.5881 0.4318 -1.362 ## ## Correlation of Fixed Effects: ## (Intr) tretT1 tretT2 weekt trtT1: ## treatT1 -0.810 ## treatT2 -0.803 0.651 ## weekt 0.232 -0.188 -0.186 ## treatT1:wkt -0.188 0.232 0.151 -0.810 ## treatT2:wkt -0.186 0.151 0.232 -0.803 0.651
acfresid(fskewlmm, maxLag = 4) %>% kable(digits = 4)
lag | ACF | n.used |
---|---|---|
0 | 1.0000 | 312 |
1 | 0.1898 | 260 |
2 | -0.1008 | 208 |
3 | -0.0677 | 156 |
4 | -0.1066 | 104 |
acfresid(fskewlmm, maxLag = 4, calcCI = TRUE) %>% plot()
The following methods/functions are available:
plot
summary
residuals
ranef
fitted
predict
update
mahalDist
mahalDistCens
The misspecification of the distributional assumption may result in invalid statistical inferences, especially when dealing with heavy tails and skewness
For instance, substantial bias in the ML estimates of regression parameters can result when the random-effects distribution is misspecified (Drikvandi et al., 2017)
In longitudinal studies, repeated measures are collected over time and hence the error term can be serially correlated (not necessarily handled by the random effects specification)
The following dependence structures can be helpful to model serial correlation:
Uncorrelated (UNC): \(\boldsymbol{\Omega}_i = \sigma^2_e \mathbf{I}_{n_i}\).
Autoregressive dependence of order \(p\) (AR(\(p\))): \[\boldsymbol{\Omega}_i = \sigma^2_e \mathbf{R}_i = \frac{\sigma^2_e}{1 - \phi_1 \rho_1 - \ldots - \phi_p \rho_p} [\rho_{|r-s|}],\] where \(\rho_1, \ldots, \rho_p\) are the theoretical autocorrelations of the process and functions of \(\boldsymbol{\phi} = (\phi_1, \ldots, \phi_p)^\top\), and they satisfy the Yule-Walker equations.
Damped exponential correlation (DEC): \[\boldsymbol{\Omega}_i = \sigma_e^2 \mathbf{R}_i = \sigma_e^2 \left[ \phi_1^{|t_{ij} - t_{ik}|^{\phi_2}} \right], \quad 0 < \phi_1 < 1, \phi_2 > 0.\]
We can fit an LMM with various correlation structures with the nlme package by specifying the correlation
argument:
# Autoregressive dependence of order 1 fnlmeAR1 = update(fnlme, correlation = corAR1(), method = "ML", control = lmeControl(maxIter = 200, msMaxIter = 200, msMaxEval = 100, opt='optim'))
And with the skewlmm package using the depStruct
argument:
# Autoregressive dependence of order 1 fskewlmmAR1 = update(fskewlmm, depStruct = "ARp")
smn.lmm
vs lme
# Using smn.lmm summary(fskewlmmAR1)
## Linear mixed models with distribution norm and dependency structure ARp ## Call: ## smn.lmm(data = miceweight, formFixed = weight ~ treat * weekt, ## groupVar = "mouseID", formRandom = ~weekt, depStruct = "ARp") ## ## Distribution norm ## Random effects: ## Formula: ~weekt ## Structure: ## Estimated variance (D): ## (Intercept) weekt ## (Intercept) 11.813242 1.6041352 ## weekt 1.604135 0.3610366 ## ## Fixed effects: weight ~ treat * weekt ## with approximate confidence intervals ## Value Std.error CI 95% lower CI 95% upper ## (Intercept) 44.2116106 1.5168291 41.2386801 47.1845410 ## treatT1 0.6134184 1.7744707 -2.8644802 4.0913171 ## treatT2 -1.1315595 1.8717922 -4.8002047 2.5370858 ## weekt 1.8422904 0.4649208 0.9310624 2.7535183 ## treatT1:weekt -0.1434296 0.5368396 -1.1956158 0.9087566 ## treatT2:weekt -0.5968562 0.5566771 -1.6879231 0.4942108 ## ## Dependency structure: ARp ## Estimate(s): ## sigma2 phi1 ## 10.5285097 0.6736481 ## ## Model selection criteria: ## logLik AIC BIC ## -850.458 1722.916 1764.089 ## ## Number of observations: 312 ## Number of groups: 52
# Using lme4 summary(fnlmeAR1)
## Linear mixed-effects model fit by maximum likelihood ## Data: miceweight ## AIC BIC logLik ## 1722.517 1763.69 -850.2584 ## ## Random effects: ## Formula: ~weekt | mouseID ## Structure: General positive-definite, Log-Cholesky parametrization ## StdDev Corr ## (Intercept) 3.2523021 (Intr) ## weekt 0.5743535 0.926 ## Residual 4.5337384 ## ## Correlation Structure: AR(1) ## Formula: ~1 | mouseID ## Parameter estimate(s): ## Phi ## 0.6933295 ## Fixed effects: weight ~ treat * weekt ## Value Std.Error DF t-value p-value ## (Intercept) 44.22118 1.4072319 257 31.424235 0.0000 ## treatT1 0.59932 1.7371247 49 0.345007 0.7316 ## treatT2 -1.14040 1.7519896 49 -0.650917 0.5181 ## weekt 1.84433 0.3973757 257 4.641279 0.0000 ## treatT1:weekt -0.14241 0.4905312 257 -0.290319 0.7718 ## treatT2:weekt -0.59691 0.4947287 257 -1.206534 0.2287 ## Correlation: ## (Intr) tretT1 tretT2 weekt trtT1: ## treatT1 -0.810 ## treatT2 -0.803 0.651 ## weekt 0.287 -0.232 -0.230 ## treatT1:weekt -0.232 0.287 0.187 -0.810 ## treatT2:weekt -0.230 0.187 0.287 -0.803 0.651 ## ## Standardized Within-Group Residuals: ## Min Q1 Med Q3 Max ## -2.71637763 -0.56242622 -0.01053673 0.52145617 2.42664566 ## ## Number of Observations: 312 ## Number of Groups: 52
We can perform likelihood ratio tests to compare the fitted objects as follows:
lr.test(fskewlmmAR1, fskewlmm)
## ## Model selection criteria: ## logLik AIC BIC ## fskewlmmAR1 -850.458 1722.916 1764.089 ## fskewlmm -862.978 1745.957 1783.387 ## ## Likelihood-ratio Test ## ## chi-square statistics = 25.04094 ## df = 1 ## p-value = 5.612572e-07 ## ## The null hypothesis that both models represent the ## data equally well is rejected at level 0.05
acfresid(fskewlmmAR1, maxLag = 4, calcCI = TRUE) %>% plot()
A possible approach to reduce the effects of outliers when analyzing longitudinal data is to replace usual normality assumptions by heavier-than-normal tailed distributions.
In a previous work
\[ \left( \begin{array}{c} \textbf{b}_i \\ \boldsymbol{\epsilon}_i \end{array}\right) \stackrel {{\rm ind}}{\sim} \textrm{SMN}_{q+n_i} \left( \left(\begin{array}{c} \mathbf{0}\\ \mathbf{0} \end{array} \right), \left(\begin{array}{cc} \mathbf{D} & \mathbf{0} \\ \mathbf{0} & \boldsymbol{\Sigma}_i \end{array} \right); H \right), \]
where
the tail behavior depends on the distribution function specified by \(H\), including the distributions normal, Student-t, Slash, and contaminated-normal,
\(\boldsymbol{\Sigma}_i = \sigma_e^2 \mathbf{R}_i\), with \(\mathbf{R}_i = \mathbf{R}_i(\boldsymbol{\phi})\), \(\boldsymbol{\phi}=(\phi_1,\ldots,\phi_p)^\top\), is a structured matrix.
Updating the previous fitted model to consider the \(t\) and CN distributions to miceweight data:
ftAR1 <- update(fskewlmmAR1, distr = "t") fcnAR1 <- update(fskewlmmAR1, distr = "cn") criteria(list(normal = fskewlmmAR1, t = ftAR1, cn = fcnAR1)) %>% kable()
logLik | npar | AIC | BIC | |
---|---|---|---|---|
normal | -850.4579 | 11 | 1722.916 | 1764.089 |
t | -819.1339 | 12 | 1662.268 | 1707.184 |
cn | -815.7971 | 13 | 1657.594 | 1706.253 |
summary(fcnAR1)
## Linear mixed models with distribution cn and dependency structure ARp ## Call: ## smn.lmm(data = miceweight, formFixed = weight ~ treat * weekt, ## groupVar = "mouseID", formRandom = ~weekt, depStruct = "ARp", ## distr = "cn") ## ## Distribution cn with nu = 0.4349053 0.1426085 ## ## Random effects: ## Formula: ~weekt ## Structure: ## Estimated variance (D): ## (Intercept) weekt ## (Intercept) 5.2180664 0.7633666 ## weekt 0.7633666 0.1430489 ## ## Fixed effects: weight ~ treat * weekt ## with approximate confidence intervals ## Value Std.error CI 95% lower CI 95% upper ## (Intercept) 42.60777956 1.2565464 40.1449938 45.0705653 ## treatT1 0.60421060 1.4760881 -2.2888688 3.4972900 ## treatT2 -0.38441693 1.5243888 -3.3721641 2.6033302 ## weekt 1.15530811 0.4107661 0.3502213 1.9603949 ## treatT1:weekt 0.46052571 0.4764984 -0.4733940 1.3944455 ## treatT2:weekt -0.04879976 0.4680949 -0.9662488 0.8686493 ## ## Dependency structure: ARp ## Estimate(s): ## sigma2 phi1 ## 2.8646272 0.7740477 ## ## Model selection criteria: ## logLik AIC BIC ## -815.797 1657.594 1706.253 ## ## Number of observations: 312 ## Number of groups: 52
In applications, skewness is often handled by log-transformations. However, interpreting model estimates with respect to log-transformed outcomes is challenging and usually non-intuitive.
Instead, we propose to consider skewed distributions in such situations, which can naturally accommodate such feature.
In this context, the flexible family of distributions Scale Mixture of Skew-Normal can be considered
\[ \left( \begin{array}{c} \textbf{b}_i \\ \boldsymbol{\epsilon}_i \end{array}\right) \stackrel {{\rm ind}}{\sim} \textrm{SMSN}_{q+n_i} \left( \left(\begin{array}{c} c\boldsymbol{\Delta} \\ \mathbf{0} \end{array} \right), \left(\begin{array}{cc} \mathbf{D} & \mathbf{0} \\ \mathbf{0} & \boldsymbol{\Sigma}_i \end{array} \right), \left(\begin{array}{c} \boldsymbol{\lambda} \\ \mathbf{0} \end{array} \right); H \right), \]
the chosen location parameter for \(\mathbf{b}_i\) ensures that \(E\{\mathbf{b}_i\} = E\{\boldsymbol{\epsilon}_i\} = \mathbf{0}\), so \(E\{\mathbf{Y}_i\} =\textbf{X}_i\boldsymbol{\beta}\),
\(\boldsymbol{\lambda}\) regulates the skewness,
when \(\boldsymbol{\lambda}=\mathbf{0}\), the SMN is obtained as a special case.
We can fit a skewed model to the miceweight data with AR(1) correlation using the following syntax:
fscnAR1 <- smsn.lmm(formFixed = weight ~ treat*weekt, formRandom = ~ weekt, groupVar = "mouseID", data = miceweight, depStruct = "ARp", distr = "scn") criteria(list(normal = fskewlmmAR1, cn = fcnAR1, scn = fscnAR1)) %>% kable()
logLik | npar | AIC | BIC | |
---|---|---|---|---|
normal | -850.4579 | 11 | 1722.916 | 1764.089 |
cn | -815.7971 | 13 | 1657.594 | 1706.253 |
scn | -809.4435 | 15 | 1648.887 | 1705.032 |
lr.test(fscnAR1, fcnAR1)
## ## Model selection criteria: ## logLik AIC BIC ## fscnAR1 -809.443 1648.887 1705.032 ## fcnAR1 -815.797 1657.594 1706.253 ## ## Likelihood-ratio Test ## ## chi-square statistics = 12.70716 ## df = 2 ## p-value = 0.001740502 ## ## The null hypothesis that both models represent the ## data equally well is rejected at level 0.05
summary(fscnAR1)
## Linear mixed models with distribution scn and dependency structure ARp ## Call: ## smsn.lmm(data = miceweight, formFixed = weight ~ treat * weekt, ## groupVar = "mouseID", formRandom = ~weekt, depStruct = "ARp", ## distr = "scn") ## ## Distribution scn with nu = 0.4749987 0.1378151 ## ## Random effects: ## Formula: ~weekt ## Structure: ## Estimated variance (D): ## (Intercept) weekt ## (Intercept) 12.49990 1.5979104 ## weekt 1.59791 0.2805946 ## ## Fixed effects: weight ~ treat * weekt ## with approximate confidence intervals ## Value Std.error CI 95% lower CI 95% upper ## (Intercept) 44.21203515 1.6914356 40.8968824 47.5271879 ## treatT1 -0.26795376 1.3757840 -2.9644409 2.4285333 ## treatT2 -0.48368047 1.3266640 -3.0838941 2.1165332 ## weekt 1.38022226 0.4030417 0.5902751 2.1701694 ## treatT1:weekt 0.36485313 0.4302992 -0.4785177 1.2082240 ## treatT2:weekt -0.09452354 0.4091909 -0.8965230 0.7074759 ## ## Dependency structure: ARp ## Estimate(s): ## sigma2 phi1 ## 2.4720220 0.6774895 ## ## Skewness parameter estimate: 14.55327 4.307805 ## ## Model selection criteria: ## logLik AIC BIC ## -809.443 1648.887 1705.032 ## ## Number of observations: 312 ## Number of groups: 52
We can assess the goodness of fit using a Healy-type plot:
grid.arrange(healy.plot(fskewlmmAR1, calcCI = TRUE) + geom_point(size = 1.5), healy.plot(fscnAR1, calcCI = TRUE) + geom_point(size = 1.5), nrow = 1)
We can check for residual correlation based on ACF:
grid.arrange(plot(acfresid(fskewlmmAR1, calcCI = TRUE, maxLag = 5)), plot(acfresid(fscnAR1, calcCI = TRUE, maxLag = 5)), nrow = 1)
We can evaluate atypical observations using the Mahalanobis distance:
grid.arrange(plot(mahalDist(fskewlmmAR1), nlabels = 2), plot(mahalDist(fscnAR1), nlabels = 2), nrow = 1)
confint(fscnAR1) %>% round(3)
## Estimate CI 95% lower CI 95% upper ## (Intercept) 44.212 40.897 47.527 ## treatT1 -0.268 -2.964 2.429 ## treatT2 -0.484 -3.084 2.117 ## weekt 1.380 0.590 2.170 ## treatT1:weekt 0.365 -0.479 1.208 ## treatT2:weekt -0.095 -0.897 0.707 ## sigma2 2.472 1.519 3.425 ## phiAR1 0.677 0.249 1.106 ## Dsqrt11 3.511 1.128 5.894 ## Dsqrt12 0.416 -0.095 0.928 ## Dsqrt22 0.328 -0.570 1.226 ## lambda1 14.553 NA NA ## lambda2 4.308 NA NA ## nu1 0.475 NA NA ## nu2 0.138 NA NA
boot_fscnAR1 <- confint(fscnAR1, method = "bootstrap", B=100) boot_fscnAR1 %>% round(digits = 3)
## Estimate 2.5% 97.5% ## (Intercept) 44.212 41.817 46.281 ## treatT1 -0.268 -2.807 1.881 ## treatT2 -0.484 -3.172 1.690 ## weekt 1.380 0.958 2.000 ## treatT1:weekt 0.365 -0.272 0.950 ## treatT2:weekt -0.095 -0.833 0.471 ## sigma2 2.472 1.485 3.086 ## phiAR1 0.677 0.351 0.768 ## Dsqrt11 3.511 2.771 4.098 ## Dsqrt12 0.416 0.154 0.624 ## Dsqrt22 0.328 0.115 0.640 ## lambda1 14.553 12.956 90.634 ## lambda2 4.308 2.067 34.217 ## nu1 0.475 0.301 0.624 ## nu2 0.138 0.103 0.188
In a recent work
Case-deletion measures;
Local influence (under several perturbations).
Such tools will be incorporated in the skewlmm package soon.
Censoring can occur when measurement techniques are subject to limits of detection, above or below which the exact measure is not detectable .
Missing data can occur in longitudinal studies for different reasons. E.g.:
Loss to follow up;
Dropout (deliberate discontinuation);
Intermittent missing (missed appointments).
A missing observation can be considered as a censored observation for which the censoring interval is the full real line (\(-\infty,\infty\)).
Remark:
The context of censoring presented here is different from the one used in time-to-event outcomes.
Assume that some mice (more affected by the diet) dropped out on the study after the 8th week of diet:
Under a missing at random mechanism, a maximum likelihood-based estimates (e.g., LMM or GLMM) will usually yield unbiased estimates (as opposed to marginal models, such as GEE).
For improve the performance, a model-based approach based on Expectation-Maximization (EM) algorithm can be used for likelihood-based estimation, where missing values are treated as latent variable
Full model specification can be found in Matos et al.
This method is implemented in the context of censored data as part of the R package skewlmm.
smn.clmm
functionsmn.clmm(data, formFixed, groupVar, formRandom = ~1, depStruct = "UNC", ci, lcl, ucl, timeVar = NULL, distr = "norm", nufix = FALSE, pAR = 1, control = lmmControl())
depStruct
a character indicating the dependence structure
timeVar
a character containing the name of the time variable
pAR
a number indicating the order of the autoregressive process
distr
a character indicating which distribution should be used
ci
a character containing the name of the censoring indicator variable
lcl
a character containing the name of the lower censoring limit in data
ucl
a character containing the name of the upper censoring limit in data
fitCens <- smn.clmm(formFixed = weight ~ treat*weekt, formRandom = ~ weekt, groupVar = "mouseID", data = miceweight_miss, depStruct = "ARp", ci = "miss", lcl = "lcl") fitCensT <- update(fitCens, distr = "t")
criteria(list(normal = fitCens, t = fitCensT)) %>% kable()
logLik | npar | AIC | BIC | |
---|---|---|---|---|
normal | -1694.873 | 11 | 3411.746 | 3460.544 |
t | -1674.423 | 12 | 3372.846 | 3426.080 |
summary(fitCensT)$tableFixed %>% kable(digits=4)
Value | Std.error | CI 95% lower | CI 95% upper | |
---|---|---|---|---|
(Intercept) | 42.4492 | 0.9968 | 40.4955 | 44.4030 |
treatT1 | 1.7185 | 1.2021 | -0.6375 | 4.0746 |
treatT2 | -0.0171 | 1.2628 | -2.4922 | 2.4580 |
weekt | 1.1858 | 0.3449 | 0.5097 | 1.8619 |
treatT1:weekt | 0.6570 | 0.4029 | -0.1327 | 1.4467 |
treatT2:weekt | -0.1879 | 0.4135 | -0.9983 | 0.6226 |
SMSN-LMM:
Censored response:
Larissa Avila Matos
Associate Professor - Statistics