The 'dlmtree' function accommodates various response variable types, including continuous, binary, and zero-inflated count values. The function is designed to handle both single exposure and exposure mixtures. For a single exposure, users are offered options to model non-linear effects (tdlnm), linear effects (tdlm), or heterogeneous subgroup/individualized effects (hdlm). In the case of exposure mixtures, the function supports lagged interactions (tdlmm), and heterogeneous subgroup/individualized effects (hdlmm) allowing for a comprehensive exploration of mixture exposure heterogeneity. Additionally, users can fine-tune parameters to impose effect shrinkage and perform exposure selection, enhancing the adaptability and precision of the modeling process. For more detailed documentation, visit: dlmtree website.
Usage
dlmtree(
formula,
data,
exposure.data,
dlm.type = "linear",
family = "gaussian",
mixture = FALSE,
het = FALSE,
n.trees = 20,
n.burn = 1000,
n.iter = 2000,
n.thin = 2,
shrinkage = "all",
dlmtree.params = c(0.95, 2),
dlmtree.step.prob = c(0.25, 0.25),
binomial.size = 1,
formula.zi = NULL,
tdlnm.exposure.splits = 20,
tdlnm.time.split.prob = NULL,
tdlnm.exposure.se = NULL,
hdlm.modifiers = "all",
hdlm.modifier.splits = 20,
hdlm.modtree.params = c(0.95, 2),
hdlm.modtree.step.prob = c(0.25, 0.25, 0.25),
hdlm.dlmtree.type = "shared",
hdlm.selection.prior = 0.5,
mixture.interactions = "noself",
mixture.prior = 1,
monotone.gamma0 = NULL,
monotone.sigma = NULL,
monotone.tree.time.params = c(0.95, 2),
monotone.tree.exp.params = c(0.95, 2),
monotone.time.kappa = NULL,
subset = NULL,
lowmem = FALSE,
verbose = TRUE,
save.data = TRUE,
diagnostics = FALSE,
initial.params = NULL
)
Arguments
- formula
object of class formula, a symbolic description of the fixed effect model to be fitted, e.g. y ~ a + b.
- data
data frame containing variables used in the formula.
- exposure.data
numerical matrix of exposure data with same length as data, for a mixture setting (tdlmm, hdlmm): named list containing equally sized numerical matrices of exposure data having same length as data.
- dlm.type
dlm model specification: "linear" (default), "nonlinear", "monotone".
- family
'gaussian' for continuous response, 'logit' for binomial, 'zinb' for zero-inflated negative binomial.
- mixture
flag for mixture, set to TRUE for tdlmm and hdlmm. (default: FALSE)
- het
flag for heterogeneity, set to TRUE for hdlm and hdlmm. (default: FALSE)
- n.trees
integer for number of trees in ensemble.
- n.burn
integer for length of MCMC burn-in.
- n.iter
integer for number of MCMC iterations to run model after burn-in.
- n.thin
integer MCMC thinning factor, i.e. keep every tenth iteration.
- shrinkage
character "all" (default), "trees", "exposures", "none", turns on horseshoe-like shrinkage priors for different parts of model.
- dlmtree.params
numerical vector of alpha and beta hyperparameters controlling dlm tree depth. (default: alpha = 0.95, beta = 2)
- dlmtree.step.prob
numerical vector for probability of each step for dlm tree updates: 1) grow/prune, 2) change, 3) switch exposure. (default: c(0.25, 0.25, 0.25))
- binomial.size
integer type scalar (if all equal, default: 1) or vector defining binomial size for 'logit' family.
- formula.zi
(only applies to family = 'zinb') object of class formula, a symbolic description of the fixed effect of zero-inflated (ZI) model to be fitted, e.g. y ~ a + b. This only applies to ZINB where covariates for ZI model are different from NB model. This is set to the argument 'formula' by default.
- tdlnm.exposure.splits
scalar indicating the number of splits (divided evenly across quantiles of the exposure data) or list with two components: 'type' = 'values' or 'quantiles', and 'split.vals' = a numerical vector indicating the corresponding exposure values or quantiles for splits.
- tdlnm.time.split.prob
probability vector of a spliting probabilities for time lags. (default: uniform probabilities)
- tdlnm.exposure.se
numerical matrix of exposure standard errors with same size as exposure.data or a scalar smoothing factor representing a uniform smoothing factor applied to each exposure measurement. (default: sd(exposure.data)/2)
- hdlm.modifiers
string vector containing desired modifiers to be included in a modifier tree. The strings in the vector must match the names of the columns of the data. By default, a modifier tree considers all covariates in the formula as modifiers unless stated otherwise.
- hdlm.modifier.splits
integer value to determine the possible number of splitting points that will be used for a modifier tree.
- hdlm.modtree.params
numerical vector of alpha and beta hyperparameters controlling modifier tree depth. (default: alpha = 0.95, beta = 2)
- hdlm.modtree.step.prob
numerical vector for probability of each step for modifier tree updates: 1) grow, 2) prune, 3) change. (default: c(0.25, 0.25, 0.25))
- hdlm.dlmtree.type
specification of dlmtree type for HDLM: shared (default) or nested.
- hdlm.selection.prior
scalar hyperparameter for sparsity of modifiers. Must be between 0.5 and 1. Smaller value corresponds to increased sparsity of modifiers.
- mixture.interactions
'noself' (default) which estimates interactions only between two different exposures, 'all' which also allows interactions within the same exposure, or 'none' which eliminates all interactions and estimates only main effects of each exposure.
- mixture.prior
positive scalar hyperparameter for sparsity of exposures. (default: 1)
- monotone.gamma0
vector (with length equal to number of lags) of means for logit-transformed prior probability of split at each lag; e.g., gamma_0l = 0 implies mean prior probability of split at lag l = 0.5.
- monotone.sigma
symmetric matrix (usually with only diagonal elements) corresponding to gamma_0 to define variances on prior probability of split; e.g., gamma_0l = 0 with lth diagonal element of sigma=2.701 implies that 95% of the time the prior probability of split is between 0.005 and 0.995, as a second example setting gamma_0l=4.119 and the corresponding diagonal element of sigma=0.599 implies that 95% of the time the prior probability of a split is between 0.8 and 0.99.
- monotone.tree.time.params
numerical vector of hyperparameters for monotone time tree.
- monotone.tree.exp.params
numerical vector of hyperparameters for monotone exposure tree.
- monotone.time.kappa
scaling factor in dirichlet prior that goes alongside
tdlnm.time.split.prob
to control the amount of prior information given to the model for deciding probabilities of splits between adjacent lags.- subset
integer vector to analyze only a subset of data and exposures.
- lowmem
TRUE or FALSE (default): turn on memory saver for DLNM, slower computation time.
- verbose
TRUE (default) or FALSE: print output
- save.data
TRUE (default) or FALSE: save data used for model fitting. This must be set to TRUE to use shiny() function on hdlm or hdlmm
- diagnostics
TRUE or FALSE (default) keep model diagnostic such as the number of terminal nodes and acceptance ratio.
- initial.params
initial parameters for fixed effects model, FALSE = none (default), "glm" = generate using GLM, or user defined, length must equal number of parameters in fixed effects model.
Details
dlmtree
Model is recommended to be run for at minimum 5000 burn-in iterations followed by 15000 sampling iterations with a thinning factor of 5. Convergence can be checked by re-running the model and validating consistency of results. Examples are provided below for the syntax for running different types of models. For more examples, visit: dlmtree website.
Examples
# \donttest{
# The first three examples are for one lagged exposure
# treed distributed lag model (TDLM)
# binary outcome with logit link
D <- sim.tdlmm(sim = "A", mean.p = 0.5, n = 1000)
tdlm.fit <- dlmtree(y ~ .,
data = D$dat,
exposure.data = D$exposures[[1]],
dlm.type = "linear",
family = "logit",
binomial.size = 1)
#> Preparing data...
#>
#> Running TDLM:
#> Burn-in % complete
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 2 seconds)
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...
# summarize results
tdlm.sum <- summary(tdlm.fit)
tdlm.sum
#> ---
#> TDLM summary
#>
#> Model run info:
#> - y ~ .
#> - sample size: 1,000
#> - family: logit
#> - 20 trees
#> - 1000 burn-in iterations
#> - 2000 post-burn iterations
#> - 2 thinning factor
#> - 0.95 confidence level
#>
#> Fixed effect coefficients:
#> Mean Lower Upper
#> (Intercept) -0.211 -0.547 0.130
#> c1 0.078 -0.050 0.202
#> c2 0.088 -0.042 0.213
#> c3 -0.129 -0.250 0.003
#> c4 0.060 -0.070 0.189
#> *c5 -0.154 -0.282 -0.032
#> b1 -0.022 -0.270 0.218
#> b2 -0.052 -0.335 0.219
#> b3 0.053 -0.196 0.300
#> b4 0.103 -0.160 0.381
#> *b5 0.336 0.047 0.598
#> ---
#> * = CI does not contain zero
#>
#> DLM effect:
#> range = [-0.025, 0.093]
#> signal-to-noise = 0.231
#> critical windows: 20-27
#> Mean Lower Upper
#> Period 1 -0.014 -0.086 0.062
#> Period 2 -0.020 -0.096 0.035
#> Period 3 -0.025 -0.130 0.026
#> Period 4 0.001 -0.049 0.062
#> Period 5 0.010 -0.032 0.077
#> Period 6 0.008 -0.032 0.056
#> Period 7 0.011 -0.025 0.057
#> Period 8 0.010 -0.026 0.055
#> Period 9 0.010 -0.028 0.061
#> Period 10 0.006 -0.035 0.059
#> Period 11 -0.001 -0.051 0.042
#> Period 12 -0.007 -0.060 0.030
#> Period 13 -0.006 -0.061 0.030
#> Period 14 0.001 -0.050 0.045
#> Period 15 0.005 -0.045 0.048
#> Period 16 0.013 -0.028 0.057
#> Period 17 0.020 -0.023 0.068
#> Period 18 0.025 -0.022 0.074
#> Period 19 0.036 -0.025 0.095
#> *Period 20 0.061 0.005 0.119
#> *Period 21 0.075 0.020 0.123
#> *Period 22 0.078 0.016 0.127
#> *Period 23 0.091 0.047 0.155
#> *Period 24 0.093 0.048 0.157
#> *Period 25 0.089 0.044 0.140
#> *Period 26 0.084 0.034 0.139
#> *Period 27 0.084 0.025 0.172
#> Period 28 0.054 -0.013 0.122
#> Period 29 0.035 -0.024 0.088
#> Period 30 0.032 -0.017 0.075
#> Period 31 0.033 -0.015 0.078
#> Period 32 0.031 -0.019 0.075
#> Period 33 0.034 -0.011 0.084
#> Period 34 0.036 -0.012 0.099
#> Period 35 0.033 -0.021 0.102
#> Period 36 0.003 -0.090 0.063
#> Period 37 -0.010 -0.105 0.055
#> ---
#> * = CI does not contain zero
#>
#> ---
# plot results
plot(tdlm.sum)
# Treed distributed lag nonlinear model (TDLNM)
# Gaussian regression model
D <- sim.tdlnm(sim = "A", error.to.signal = 1)
tdlnm.fit <- dlmtree(formula = y ~ .,
data = D$dat,
exposure.data = D$exposures,
dlm.type = "nonlinear",
family = "gaussian")
#> Preparing data...
#>
#> Running TDLNM:
#> Burn-in % complete
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 0 seconds)
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...
# summarize results
tdlnm.sum <- summary(tdlnm.fit)
#> Centered DLNM at exposure value 0
tdlnm.sum
#> ---
#> TDLNM summary
#>
#> Model run info:
#> - y ~ .
#> - sample size: 926
#> - family: gaussian
#> - 20 trees
#> - 1000 burn-in iterations
#> - 2000 post-burn iterations
#> - 2 thinning factor
#> - 0.95 confidence level
#>
#> Fixed effect coefficients:
#> Mean Lower Upper
#> (Intercept) -3.401 -13.502 7.062
#> *c1 0.907 0.797 1.006
#> *c2 -0.187 -0.299 -0.085
#> *c3 0.458 0.334 0.570
#> *c4 0.689 0.576 0.799
#> *c5 -1.406 -1.512 -1.293
#> *b1 1.453 1.252 1.666
#> *b2 -0.786 -1.009 -0.572
#> *b3 -0.639 -0.858 -0.421
#> *b4 -1.543 -1.759 -1.322
#> *b5 0.837 0.628 1.058
#> ---
#> * = CI does not contain zero
#>
#> DLNM effect:
#> range = [-1.203, 0.217]
#> signal-to-noise = 0.849
#> critical windows: 3-15,18-24
#>
#> residual standard errors: 0.138
# plot results
plot(tdlnm.sum)
# Heterogenious TDLM (HDLM), similar to first example but with heterogenious exposure response
D <- sim.hdlmm(sim = "B", n = 1000)
hdlm.fit <- dlmtree(y ~ .,
data = D$dat,
exposure.data = D$exposures,
dlm.type = "linear",
family = "gaussian",
het = TRUE)
#> Preparing data...
#>
#> Running shared HDLM:
#> Burn-in % complete
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 6 seconds)
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...
# summarize results
hdlm.sum <- summary(hdlm.fit)
hdlm.sum
#> ---
#> HDLM summary
#>
#> Model run info:
#> - y ~ .
#> - sample size: 1,000
#> - family: gaussian
#> - 20 trees
#> - 1000 burn-in iterations
#> - 2000 post-burn iterations
#> - 2 thinning factor
#> - 0.5 modifier sparsity prior
#> - 0.95 confidence level
#>
#> Fixed effects:
#> Mean Lower Upper
#> (Intercept) 0.066 -0.125 0.265
#> *mod_num -0.770 -0.829 -0.709
#> *mod_bin -0.907 -1.038 -0.780
#> *mod_scale -0.816 -1.037 -0.603
#> *c1 -1.769 -1.837 -1.705
#> *c2 -0.326 -0.390 -0.262
#> *c3 -0.376 -0.437 -0.317
#> *c4 0.448 0.390 0.507
#> *c5 -0.844 -0.911 -0.782
#> *b1 -1.656 -1.779 -1.532
#> *b2 -0.217 -0.338 -0.101
#> *b3 1.359 1.229 1.488
#> *b4 1.043 0.918 1.167
#> b5 -0.025 -0.144 0.092
#> ---
#> * = CI does not contain zero
#>
#> Modifiers:
#> PIP
#> mod_num 1.000
#> mod_bin 0.198
#> mod_scale 1.000
#> c1 0.192
#> c2 0.231
#> c3 0.204
#> c4 0.193
#> c5 0.300
#> b1 0.220
#> b2 0.170
#> b3 0.234
#> b4 0.277
#> b5 0.344
#> ---
#> PIP = Posterior inclusion probability
#>
#> residual standard errors: 0.045
#> ---
#> To obtain exposure effect estimates, use the 'shiny(fit)' function.
#>
# shiny app for HDLM
if (interactive()) {
shiny(hdlm.fit)
}
# The next two examples are for a mixture (or multivariate) exposure
# Treed distributed lag mixture model (TDLMM)
# Model for mixutre (or multivariate) lagged exposures
# with a homogenious exposure-time-response function
D <- sim.tdlmm(sim = "B", error = 25, n = 1000)
tdlmm.fit <- dlmtree(y ~ .,
data = D$dat, exposure.data = D$exposures,
mixture.interactions = "noself",
dlm.type = "linear", family = "gaussian",
mixture = TRUE)
#> Preparing data...
#>
#> Running TDLMM:
#> Burn-in % complete
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 4 seconds)
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...
# summarize results
tdlmm.sum <- summary(tdlmm.fit)
#> Specified co-exposure values:
#> - e1 : 4.210665
#> - e2 : 4.521686
#> - e3 : 4.542955
#> - e4 : 4.16903
#> - e5 : 5.989966
#>
#> Reconstructing main effects...
#> Reconstructing interaction effects...
#> 0%...25%...50%...75%...100%
#> Calculating marginal effects...
#> Calculating fixed effects...
# plot the marginal exposure-response for one exposure
plot(tdlmm.sum, exposure1 = "e1")
# plot exposure-response surface
plot(tdlmm.sum, exposure1 = "e1", exposure2 = "e2")
# heterogenious version of TDLMM
D <- sim.hdlmm(sim = "D", n = 1000)
hdlmm.fit <- dlmtree(y ~ .,
data = D$dat,
exposure.data = D$exposures,
dlm.type = "linear",
family = "gaussian",
mixture = TRUE,
het = TRUE)
#> Preparing data...
#>
#> Running HDLMM:
#> Burn-in % complete
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> MCMC iterations (est time: 18 seconds)
#> [0--------25--------50--------75--------100]
#> ''''''''''''''''''''''''''''''''''''''''''
#> Compiling results...
# summarize results
hdlmm.sum <- summary(hdlmm.fit)
hdlmm.sum
#> ---
#> HDLMM summary
#>
#> Model run info:
#> - y ~ .
#> - family: gaussian
#> - 20 trees
#> - 1000 burn-in iterations
#> - 2000 post-burn iterations
#> - 2 thinning factor
#> - 3 exposures measured at 37 time points
#> - 3 two-way interactions (no-self interactions)
#> - 0.5 modifier sparsity prior
#> - 1 exposure sparsity prior
#> - 0.95 confidence level
#>
#> Fixed effects:
#> Mean Lower Upper
#> (Intercept) 0.082 -0.183 0.356
#> *mod_num -0.524 -0.621 -0.424
#> *mod_bin 0.613 0.428 0.805
#> mod_scale 0.162 -0.087 0.413
#> *c1 -0.951 -1.017 -0.883
#> *c2 -0.354 -0.423 -0.291
#> *c3 -0.981 -1.047 -0.913
#> c4 0.025 -0.042 0.095
#> *c5 -0.376 -0.455 -0.307
#> *b1 -0.821 -0.957 -0.674
#> *b2 0.210 0.056 0.351
#> *b3 -1.716 -1.859 -1.571
#> *b4 0.395 0.252 0.532
#> *b5 0.409 0.265 0.548
#> ---
#> * = CI does not contain zero
#>
#> Modifiers:
#> PIP
#> mod_num 1.000
#> mod_bin 1.000
#> mod_scale 0.578
#> c1 0.597
#> c2 0.664
#> c3 0.621
#> c4 0.583
#> c5 0.619
#> b1 0.656
#> b2 0.544
#> b3 0.540
#> b4 0.650
#> b5 0.653
#> ---
#> PIP = Posterior inclusion probability
#>
#> residual standard errors: 0.151
#> ---
#> To obtain exposure effect estimates, use the 'shiny(fit)' function.
#>
# summarize results
if (interactive()) {
shiny(hdlmm.fit)
}
# }