This document illustrates the main features of tdlnm within the dlmtree package. Additional information regarding the statistical methodology and computational details are given in Mork & Wilson (2021).

Overview of DLNMs

The distributed lag nonlinear model (DLNM) is a tool for estimating an exposure-response function due to a series of possibly correlated, lagged exposures. In particular, we consider the exposure-response function \(f(\mathbf{x}_i)=\sum_t w(x_{it},t)\) where \(w(x_{it},t)\) represents the DLNM.

TDLNM

Often, \(w\) is constrained to vary smoothly over time and estimated via spline-based methods. TDLNM allows for smoothness, but also enables a `hard’ cutoff in exposure effects at a given timepoint, which allows for more precision in identifying periods of effect/no-effect.

TDLNM method operates using a set of regression trees that each assume piecewise constant relationships across the exposure-time space. TDLNM outperforms spline-based models when the exposure-time surface is not smooth and performs similarly in smooth settings. TDLNM also produces lower variance estimates while maintain high coverage and more precisely identifies critical windows within the exposure-time surface.

Basic example

Installation

To install package dlmtree, use the devtools package and run function install_github.

library(devtools)
install_github("danielmork/dlmtree")

Data

For this example we will use a data simulation available in the dlmtree package.

library(dlmtree)
set.seed(1)
D <- sim.tdlnm(effect = "A", error.to.signal = 1) 
# Also try effects B (linear), C (logistic), and D (logistic and smooth in time)

Model

  • Define a fixed effects formula using data fed to the data parameter
  • Input exposure.data as a matrix, where the columns relate to the time periods of exposure (this could also be interpreted as lags where the first column is the most recent exposure)
  • Specify exposure.se for a smooth effect (TDLNMse) in exposure-concentration, in our paper we used half the standard deviation of the exposure data, which gave consistent results across a range of simulation scenarios
  • Define exposure.splits across the range of exposure-concentration values. This could be given as a fixed set of values, or a single value which will split evenly across the quantiles of the exposure data. Setting this to \(0\) results in a DLM (linear effects)
  • Run the model for the desired number of iterations. We recommend using \(20\) trees and \(\geq15000\) iterations thinned to every tenth iteration following \(\geq5000\) burn-in iterations. Model convergence can be checked by comparing the consistency of results across multiple runs
  • Our method currently supports a continuous or binary outcome by specifying ‘gaussian’ or ‘logit’ in the family parameter
splits <- seq(min(D$exposure), max(D$exposure), length.out = 52)
res <- tdlnm(formula = y ~ ., 
             data = D$dat, 
             exposure.data = as.matrix(D$exposure),
             #exposure.se = sd(D$exposure)/2, 
             exposure.splits = splits,
             n.trees = 20, n.burn = 2000, n.iter = 5000, n.thin = 5,
             family = "gaussian")
## Preparing data...
## exposure.splits entered as numeric vector, assuming values are exposure splitting points
## Burn-in % complete 
## [0--------25--------50--------75--------100]
##  ''''''''''''''''''''''''''''''''''''''''''
## MCMC iterations (est time: 5 seconds)
## [0--------25--------50--------75--------100]
##  ''''''''''''''''''''''''''''''''''''''''''
## Compiling results...

Summary

  • Summarize results using the summary function
  • Input ‘cenval’ will center the DLNM surface at a specified exposure concentration
  • The ‘pred.at’ input specifies a range of exposure values to make predictions, if empty this default to the splitting values
  • ‘conf.level’ will allow for different credible interval probabilities
  • Note that in the instance of a binary outcome, results are given as log(odds). This can be converted to a standard odds ratio using exp(res_sum$gamma.mean) and exp(res_sum$gamma.ci)
res_sum <- summary(res, 
                   cenval = D$cenval,
                   pred.at = seq(min(D$exposure), max(D$exposure), length.out = 100),
                   conf.level = 0.95)
## Centered DLNM at exposure value 1
res_sum
## ---
## TDLNM summary
## 
## Model run info
## - 20 trees
## - 2000 burn-in iterations
## - 5000 post-burn iterations
## - 5 thinning factor
## - 0.95 confidence level
## 
## Fixed effect coefficients:
##               Mean Lower.Bound Upper.Bound
## (Intercept) -4.150     -25.682      17.959
## *c1          0.683       0.586       0.785
## *c2          1.288       1.198       1.386
## *c3          0.301       0.200       0.400
## *c4         -0.994      -1.084      -0.904
## *c5         -0.575      -0.673      -0.475
## *b1         -2.451      -2.646      -2.245
## *b2         -1.569      -1.770      -1.366
## *b3          1.723       1.522       1.923
## *b4          0.580       0.379       0.776
## *b5         -1.512      -1.717      -1.304
## ---
## * = CI does not contain zero
## 
## DLNM effect:
## range = [-1.05, 0.064]
## signal-to-noise = 1.072
## critical windows: 11-15

Fixed Effects

boxplot(res$gamma[,-1], xlab = "Parameters", ylab = "Effect") # Boxplot estimated effects
points(1:10, res_sum$gamma.mean[-1], col = "red", pch = 16) # Posterior mean
points(1:10, D$params, col = "green", pch = 16) # Green dots = truth

Plot DLNM exposure-time response surface

plot(res_sum,
     plot.type = "mean", # (default), also 'se', 'ci-min', 'ci-max', 'slice'
     main = "Plot title", 
     xlab = "Time axis", 
     ylab = "Exposure-concentration axis", 
     flab = "Effect color label",
     start.time = 1) # change to reorient x-axis

Slice at exposure-concentration = 3

plot(res_sum, plot.type = "slice", val = 3, main = "Slice at concentration 3")

Slice at time = 13

plot(res_sum, plot.type = "slice", time = 13, main = "Slice at time 13")

Compare to truth

truth <- D$dlnm.fun(sapply(1:37, function(i) res_sum$pred.at), D$cenval, F)
# RMSE
sqrt(mean((res_sum$matfit - truth)^2))
## [1] 0.0318009
# Coverage
mean(res_sum$cilower < truth & res_sum$ciupper > truth)
## [1] 0.98
# True positive effect classification
(length(which(res_sum$cilower > 0 & truth > 0)) + 
    length(which(res_sum$ciupper < 0 & truth < 0))) /
  length(which(truth != 0))
## [1] 1
# False positive effect classification
(length(which(res_sum$cilower > 0 & truth == 0)) + 
    length(which(res_sum$ciupper < 0 & truth == 0))) /
  length(which(truth == 0))
## [1] 0