Skip to contents

The 'dlmtree' function accommodates various response variable types, including continuous, binary, and zero-inflated count values. The function is designed to handle both single exposure and exposure mixtures. For a single exposure, users are offered options to model non-linear effects (tdlnm), linear effects (tdlm), or heterogeneous subgroup/individualized effects (hdlm). In the case of exposure mixtures, the function supports lagged interactions (tdlmm), and heterogeneous subgroup/individualized effects (hdlmm) allowing for a comprehensive exploration of mixture exposure heterogeneity. Additionally, users can fine-tune parameters to impose effect shrinkage and perform exposure selection, enhancing the adaptability and precision of the modeling process. For more detailed documentation, visit: dlmtree website.

Usage

dlmtree(
  formula,
  data,
  exposure.data,
  dlm.type = "linear",
  family = "gaussian",
  mixture = FALSE,
  het = FALSE,
  n.trees = 20,
  n.burn = 1000,
  n.iter = 2000,
  n.thin = 2,
  shrinkage = "all",
  dlmtree.params = c(0.95, 2),
  dlmtree.step.prob = c(0.25, 0.25),
  binomial.size = 1,
  formula.zi = NULL,
  tdlnm.exposure.splits = 20,
  tdlnm.time.split.prob = NULL,
  tdlnm.exposure.se = NULL,
  hdlm.modifiers = "all",
  hdlm.modifier.splits = 20,
  hdlm.modtree.params = c(0.95, 2),
  hdlm.modtree.step.prob = c(0.25, 0.25, 0.25),
  hdlm.dlmtree.type = "shared",
  hdlm.selection.prior = 0.5,
  mixture.interactions = "noself",
  mixture.prior = 1,
  monotone.gamma0 = NULL,
  monotone.sigma = NULL,
  monotone.tree.time.params = c(0.95, 2),
  monotone.tree.exp.params = c(0.95, 2),
  monotone.time.kappa = NULL,
  subset = NULL,
  lowmem = FALSE,
  verbose = TRUE,
  save.data = TRUE,
  diagnostics = FALSE,
  initial.params = NULL
)

Arguments

formula

object of class formula, a symbolic description of the fixed effect model to be fitted, e.g. y ~ a + b.

data

data frame containing variables used in the formula.

exposure.data

numerical matrix of exposure data with same length as data, for a mixture setting (tdlmm, hdlmm): named list containing equally sized numerical matrices of exposure data having same length as data.

dlm.type

dlm model specification: "linear" (default), "nonlinear", "monotone".

family

'gaussian' for continuous response, 'logit' for binomial, 'zinb' for zero-inflated negative binomial.

mixture

flag for mixture, set to TRUE for tdlmm and hdlmm. (default: FALSE)

het

flag for heterogeneity, set to TRUE for hdlm and hdlmm. (default: FALSE)

n.trees

integer for number of trees in ensemble.

n.burn

integer for length of MCMC burn-in.

n.iter

integer for number of MCMC iterations to run model after burn-in.

n.thin

integer MCMC thinning factor, i.e. keep every tenth iteration.

shrinkage

character "all" (default), "trees", "exposures", "none", turns on horseshoe-like shrinkage priors for different parts of model.

dlmtree.params

numerical vector of alpha and beta hyperparameters controlling dlm tree depth. (default: alpha = 0.95, beta = 2)

dlmtree.step.prob

numerical vector for probability of each step for dlm tree updates: 1) grow/prune, 2) change, 3) switch exposure. (default: c(0.25, 0.25, 0.25))

binomial.size

integer type scalar (if all equal, default: 1) or vector defining binomial size for 'logit' family.

formula.zi

(only applies to family = 'zinb') object of class formula, a symbolic description of the fixed effect of zero-inflated (ZI) model to be fitted, e.g. y ~ a + b. This only applies to ZINB where covariates for ZI model are different from NB model. This is set to the argument 'formula' by default.

tdlnm.exposure.splits

scalar indicating the number of splits (divided evenly across quantiles of the exposure data) or list with two components: 'type' = 'values' or 'quantiles', and 'split.vals' = a numerical vector indicating the corresponding exposure values or quantiles for splits.

tdlnm.time.split.prob

probability vector of a spliting probabilities for time lags. (default: uniform probabilities)

tdlnm.exposure.se

numerical matrix of exposure standard errors with same size as exposure.data or a scalar smoothing factor representing a uniform smoothing factor applied to each exposure measurement. (default: sd(exposure.data)/2)

hdlm.modifiers

string vector containing desired modifiers to be included in a modifier tree. The strings in the vector must match the names of the columns of the data. By default, a modifier tree considers all covariates in the formula as modifiers unless stated otherwise.

hdlm.modifier.splits

integer value to determine the possible number of splitting points that will be used for a modifier tree.

hdlm.modtree.params

numerical vector of alpha and beta hyperparameters controlling modifier tree depth. (default: alpha = 0.95, beta = 2)

hdlm.modtree.step.prob

numerical vector for probability of each step for modifier tree updates: 1) grow, 2) prune, 3) change. (default: c(0.25, 0.25, 0.25))

hdlm.dlmtree.type

specification of dlmtree type for HDLM: shared (default) or nested.

hdlm.selection.prior

scalar hyperparameter for sparsity of modifiers. Must be between 0.5 and 1. Smaller value corresponds to increased sparsity of modifiers.

mixture.interactions

'noself' (default) which estimates interactions only between two different exposures, 'all' which also allows interactions within the same exposure, or 'none' which eliminates all interactions and estimates only main effects of each exposure.

mixture.prior

positive scalar hyperparameter for sparsity of exposures. (default: 1)

monotone.gamma0

vector (with length equal to number of lags) of means for logit-transformed prior probability of split at each lag; e.g., gamma_0l = 0 implies mean prior probability of split at lag l = 0.5.

monotone.sigma

symmetric matrix (usually with only diagonal elements) corresponding to gamma_0 to define variances on prior probability of split; e.g., gamma_0l = 0 with lth diagonal element of sigma=2.701 implies that 95% of the time the prior probability of split is between 0.005 and 0.995, as a second example setting gamma_0l=4.119 and the corresponding diagonal element of sigma=0.599 implies that 95% of the time the prior probability of a split is between 0.8 and 0.99.

monotone.tree.time.params

numerical vector of hyperparameters for monotone time tree.

monotone.tree.exp.params

numerical vector of hyperparameters for monotone exposure tree.

monotone.time.kappa

scaling factor in dirichlet prior that goes alongside tdlnm.time.split.prob to control the amount of prior information given to the model for deciding probabilities of splits between adjacent lags.

subset

integer vector to analyze only a subset of data and exposures.

lowmem

TRUE or FALSE (default): turn on memory saver for DLNM, slower computation time.

verbose

TRUE (default) or FALSE: print output

save.data

TRUE (default) or FALSE: save data used for model fitting. This must be set to TRUE to use shiny() function on hdlm or hdlmm

diagnostics

TRUE or FALSE (default) keep model diagnostic such as the number of terminal nodes and acceptance ratio.

initial.params

initial parameters for fixed effects model, FALSE = none (default), "glm" = generate using GLM, or user defined, length must equal number of parameters in fixed effects model.

Value

Object of one of the classes: tdlm, tdlmm, tdlnm, hdlm, hdlmm

Details

dlmtree

Model is recommended to be run for at minimum 5000 burn-in iterations followed by 15000 sampling iterations with a thinning factor of 5. Convergence can be checked by re-running the model and validating consistency of results. Examples are provided below for the syntax for running different types of models. For more examples, visit: dlmtree website.

Examples

# \donttest{

  # The first three examples are for one lagged exposure


  # treed distributed lag model (TDLM)
  # binary outcome with logit link

  D <- sim.tdlmm(sim = "A", mean.p = 0.5, n = 1000)
  tdlm.fit <- dlmtree(y ~ .,
                      data = D$dat,
                      exposure.data = D$exposures[[1]],
                      dlm.type = "linear",
                      family = "logit",
                      binomial.size = 1)
#> Preparing data...
#> 
#> Running TDLM:
#> Burn-in % complete 
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 2 seconds)
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...

  # summarize results
  tdlm.sum <- summary(tdlm.fit)
  tdlm.sum
#> ---
#> TDLM summary
#> 
#> Model run info:
#> - y ~ . 
#> - sample size: 1,000 
#> - family: logit 
#> - 20 trees
#> - 1000 burn-in iterations
#> - 2000 post-burn iterations
#> - 2 thinning factor
#> - 0.95 confidence level
#> 
#> Fixed effect coefficients:
#>               Mean  Lower  Upper
#> (Intercept) -0.211 -0.547  0.130
#> c1           0.078 -0.050  0.202
#> c2           0.088 -0.042  0.213
#> c3          -0.129 -0.250  0.003
#> c4           0.060 -0.070  0.189
#> *c5         -0.154 -0.282 -0.032
#> b1          -0.022 -0.270  0.218
#> b2          -0.052 -0.335  0.219
#> b3           0.053 -0.196  0.300
#> b4           0.103 -0.160  0.381
#> *b5          0.336  0.047  0.598
#> ---
#> * = CI does not contain zero
#> 
#> DLM effect:
#> range = [-0.025, 0.093]
#> signal-to-noise = 0.231
#> critical windows: 20-27 
#>              Mean  Lower Upper
#> Period 1   -0.014 -0.086 0.062
#> Period 2   -0.020 -0.096 0.035
#> Period 3   -0.025 -0.130 0.026
#> Period 4    0.001 -0.049 0.062
#> Period 5    0.010 -0.032 0.077
#> Period 6    0.008 -0.032 0.056
#> Period 7    0.011 -0.025 0.057
#> Period 8    0.010 -0.026 0.055
#> Period 9    0.010 -0.028 0.061
#> Period 10   0.006 -0.035 0.059
#> Period 11  -0.001 -0.051 0.042
#> Period 12  -0.007 -0.060 0.030
#> Period 13  -0.006 -0.061 0.030
#> Period 14   0.001 -0.050 0.045
#> Period 15   0.005 -0.045 0.048
#> Period 16   0.013 -0.028 0.057
#> Period 17   0.020 -0.023 0.068
#> Period 18   0.025 -0.022 0.074
#> Period 19   0.036 -0.025 0.095
#> *Period 20  0.061  0.005 0.119
#> *Period 21  0.075  0.020 0.123
#> *Period 22  0.078  0.016 0.127
#> *Period 23  0.091  0.047 0.155
#> *Period 24  0.093  0.048 0.157
#> *Period 25  0.089  0.044 0.140
#> *Period 26  0.084  0.034 0.139
#> *Period 27  0.084  0.025 0.172
#> Period 28   0.054 -0.013 0.122
#> Period 29   0.035 -0.024 0.088
#> Period 30   0.032 -0.017 0.075
#> Period 31   0.033 -0.015 0.078
#> Period 32   0.031 -0.019 0.075
#> Period 33   0.034 -0.011 0.084
#> Period 34   0.036 -0.012 0.099
#> Period 35   0.033 -0.021 0.102
#> Period 36   0.003 -0.090 0.063
#> Period 37  -0.010 -0.105 0.055
#> ---
#> * = CI does not contain zero
#> 
#> ---

  # plot results
  plot(tdlm.sum)




  # Treed distributed lag nonlinear model (TDLNM)
  # Gaussian regression model
  D <- sim.tdlnm(sim = "A", error.to.signal = 1)
  tdlnm.fit <- dlmtree(formula = y ~ .,
                       data = D$dat,
                       exposure.data = D$exposures,
                       dlm.type = "nonlinear",
                       family = "gaussian")
#> Preparing data...
#> 
#> Running TDLNM:
#> Burn-in % complete 
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 0 seconds)
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...

  # summarize results
  tdlnm.sum <- summary(tdlnm.fit)
#> Centered DLNM at exposure value 0 
  tdlnm.sum
#> ---
#> TDLNM summary
#> 
#> Model run info:
#> - y ~ . 
#> - sample size: 926 
#> - family: gaussian 
#> - 20 trees
#> - 1000 burn-in iterations
#> - 2000 post-burn iterations
#> - 2 thinning factor
#> - 0.95 confidence level
#> 
#> Fixed effect coefficients:
#>               Mean   Lower  Upper
#> (Intercept) -3.401 -13.502  7.062
#> *c1          0.907   0.797  1.006
#> *c2         -0.187  -0.299 -0.085
#> *c3          0.458   0.334  0.570
#> *c4          0.689   0.576  0.799
#> *c5         -1.406  -1.512 -1.293
#> *b1          1.453   1.252  1.666
#> *b2         -0.786  -1.009 -0.572
#> *b3         -0.639  -0.858 -0.421
#> *b4         -1.543  -1.759 -1.322
#> *b5          0.837   0.628  1.058
#> ---
#> * = CI does not contain zero
#> 
#> DLNM effect:
#> range = [-1.203, 0.217]
#> signal-to-noise = 0.849
#> critical windows: 3-15,18-24 
#> 
#> residual standard errors: 0.138 

  # plot results
  plot(tdlnm.sum)




  # Heterogenious TDLM (HDLM), similar to first example but with heterogenious exposure response
  D <- sim.hdlmm(sim = "B", n = 1000)
  hdlm.fit <- dlmtree(y ~ .,
                      data = D$dat,
                      exposure.data = D$exposures,
                      dlm.type = "linear",
                      family = "gaussian",
                      het = TRUE)
#> Preparing data...
#> 
#> Running shared HDLM:
#> Burn-in % complete 
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 6 seconds)
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...

  # summarize results
  hdlm.sum <- summary(hdlm.fit)
  hdlm.sum
#> ---
#> HDLM summary
#> 
#> Model run info:
#> - y ~ . 
#> - sample size: 1,000 
#> - family: gaussian 
#> - 20 trees
#> - 1000 burn-in iterations
#> - 2000 post-burn iterations
#> - 2 thinning factor
#> - 0.5 modifier sparsity prior
#> - 0.95 confidence level
#> 
#> Fixed effects:
#>                Mean  Lower  Upper
#>  (Intercept)  0.066 -0.125  0.265
#> *mod_num     -0.770 -0.829 -0.709
#> *mod_bin     -0.907 -1.038 -0.780
#> *mod_scale   -0.816 -1.037 -0.603
#> *c1          -1.769 -1.837 -1.705
#> *c2          -0.326 -0.390 -0.262
#> *c3          -0.376 -0.437 -0.317
#> *c4           0.448  0.390  0.507
#> *c5          -0.844 -0.911 -0.782
#> *b1          -1.656 -1.779 -1.532
#> *b2          -0.217 -0.338 -0.101
#> *b3           1.359  1.229  1.488
#> *b4           1.043  0.918  1.167
#>  b5          -0.025 -0.144  0.092
#> ---
#> * = CI does not contain zero
#> 
#> Modifiers:
#>             PIP
#> mod_num   1.000
#> mod_bin   0.198
#> mod_scale 1.000
#> c1        0.192
#> c2        0.231
#> c3        0.204
#> c4        0.193
#> c5        0.300
#> b1        0.220
#> b2        0.170
#> b3        0.234
#> b4        0.277
#> b5        0.344
#> ---
#> PIP = Posterior inclusion probability
#> 
#> residual standard errors: 0.045
#> ---
#> To obtain exposure effect estimates, use the 'shiny(fit)' function.
#> 

  # shiny app for HDLM
  if (interactive()) {
    shiny(hdlm.fit)
  }



  # The next two examples are for a mixture (or multivariate) exposure


  # Treed distributed lag mixture model (TDLMM)
  # Model for mixutre (or multivariate) lagged exposures
  # with a homogenious exposure-time-response function
  D <- sim.tdlmm(sim = "B", error = 25, n = 1000)
  tdlmm.fit <- dlmtree(y ~ .,
                       data = D$dat, exposure.data = D$exposures,
                       mixture.interactions = "noself",
                       dlm.type = "linear", family = "gaussian",
                       mixture = TRUE)
#> Preparing data...
#> 
#> Running TDLMM:
#> Burn-in % complete 
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 4 seconds)
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...

  # summarize results
  tdlmm.sum <- summary(tdlmm.fit)
#> Specified co-exposure values:
#> - e1 : 4.210665
#> - e2 : 4.521686
#> - e3 : 4.542955
#> - e4 : 4.16903
#> - e5 : 5.989966
#> 
#> Reconstructing main effects...
#> Reconstructing interaction effects...
#> 0%...25%...50%...75%...100%
#> Calculating marginal effects...
#> Calculating fixed effects...

  # plot the marginal exposure-response for one exposure
  plot(tdlmm.sum, exposure1 = "e1")


  # plot exposure-response surface
  plot(tdlmm.sum, exposure1 = "e1", exposure2 = "e2")




  # heterogenious version of TDLMM
  D <- sim.hdlmm(sim = "D", n = 1000)
  hdlmm.fit <- dlmtree(y ~ .,
                       data = D$dat,
                       exposure.data = D$exposures,
                       dlm.type = "linear",
                       family = "gaussian",
                       mixture = TRUE,
                       het = TRUE)
#> Preparing data...
#> 
#> Running HDLMM:
#> Burn-in % complete 
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 18 seconds)
#> [0--------25--------50--------75--------100]
#>  ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...

  # summarize results
  hdlmm.sum <- summary(hdlmm.fit)
  hdlmm.sum
#> ---
#> HDLMM summary
#> 
#> Model run info:
#> - y ~ . 
#> - family: gaussian 
#> - 20 trees
#> - 1000 burn-in iterations
#> - 2000 post-burn iterations
#> - 2 thinning factor
#> - 3 exposures measured at 37 time points
#> - 3 two-way interactions (no-self interactions)
#> - 0.5 modifier sparsity prior
#> - 1 exposure sparsity prior
#> - 0.95 confidence level
#> 
#> Fixed effects:
#>                Mean  Lower  Upper
#>  (Intercept)  0.082 -0.183  0.356
#> *mod_num     -0.524 -0.621 -0.424
#> *mod_bin      0.613  0.428  0.805
#>  mod_scale    0.162 -0.087  0.413
#> *c1          -0.951 -1.017 -0.883
#> *c2          -0.354 -0.423 -0.291
#> *c3          -0.981 -1.047 -0.913
#>  c4           0.025 -0.042  0.095
#> *c5          -0.376 -0.455 -0.307
#> *b1          -0.821 -0.957 -0.674
#> *b2           0.210  0.056  0.351
#> *b3          -1.716 -1.859 -1.571
#> *b4           0.395  0.252  0.532
#> *b5           0.409  0.265  0.548
#> ---
#> * = CI does not contain zero
#> 
#> Modifiers:
#>             PIP
#> mod_num   1.000
#> mod_bin   1.000
#> mod_scale 0.578
#> c1        0.597
#> c2        0.664
#> c3        0.621
#> c4        0.583
#> c5        0.619
#> b1        0.656
#> b2        0.544
#> b3        0.540
#> b4        0.650
#> b5        0.653
#> ---
#> PIP = Posterior inclusion probability
#> 
#> residual standard errors: 0.151
#> ---
#> To obtain exposure effect estimates, use the 'shiny(fit)' function.
#> 

  # summarize results
  if (interactive()) {
    shiny(hdlmm.fit)
  }


# }