Weibull survival models with shrinkage priors in NIMBLE: a brief tutorial.

## Packages used
library(survival) # survreg() provides Kaplan-Meier and Weibull models
library(nimble)   # For Bayesian inference via MCMC
library(coda)     # For MCMC diagnostics

1 Introduction

Survival models are a class of statistical model that are used to analyse and/or predict the time until an event of interest occurs. Examples of such time to event data include the time of death, or the time of the onset of a disease, or the time to the hatching of an egg etc. These models are commonly used in medical research, ecology, epidemiology and other fields where time-to-event data is of interest.

Survival analyses typically use the idea that the time to an event can be affected by a number of factors, such as age, sex, medical history, and lifestyle. Thus, survival models identify statistical relationships between such factors and the time to event data, and can be used to make predictions about the time to an event for new individuals with known values of the predictor variables.

There are several different types of survival models, the choice of which depends on the specific data and the research question being addressed. The two most common classes of survival models are proportional hazards models and accelerated failure time models - and both types of models can be analysed in either parametric or non-parameteric paradigms. Thus, there exists a wealth of different statistical techniques available for survuival analyusis, each with their own strengths and limitations.

The Weibull survival model is a parametric survival model, which means that it assumes a specific functional form for the relationship between the predictor variables and the time to event data. Specifically, the Weibull model assumes that the survival times follow a Weibull distribution. This distribution has two parameters, which can be estimated from data using maximum likelihood estimation or Bayesian techniques.

Here we give a few short examples of how Weibull survival models are parametrised in R’s survReg package and NIMBLE. The NIMBLE examples also explore two alternative shrinkage priors that may be useful (for Bayesian variable selection) when there are too many covariates.

Note, the current version of this tutorial does not cover censored data. We hope to add examples with censored data in future versions.

We’ll kick things off with a quick recap of the fundamentals.

2 Survival model fundamentals

In survival models (a.k.a. frailty models or time-to-event models), the response variable \(T\) is the time at which an event of interest occurs. E.g. time to death (survival), or time to infection, or time when device breaks (frailty).

At the heart of a survival model lies the survival function, defined as \[\begin{equation} S(t) = P(T>t) = 1 - \int_0^t f(u)\, \mathrm{d}u = 1 - F(t) \end{equation}\] where the lifetime (or failure time, or duration) \(T\) is a positive continuous random variable, \(f\) is a probability density function (PDF) describing the distribution of durations \(T\) and \(F\) is the associated cumulative distribution function (CDF).

Another fundamental quantity is the hazard rate function \(h(t)\), defined as \[\begin{equation} h(t) = \lim_{\Delta_t \rightarrow 0} \frac{S(t) - S(t+\Delta_t)}{\Delta_t S(t)}. \end{equation}\] It can be interpreted as the event rate conditional on survival until time \(t\) as it describes the risk of the event occurring at any given time.

Given a hazard function, survival can be obtained using \[\begin{equation} \tag{2.1} S(t) = \exp \Big(-\int_0^t h(u) du \Big) \end{equation}\] and the inverse relation is \[\begin{equation} h(t) = -\frac d{dt} \log S(t) \end{equation}\]

It can also sometimes be useful to recall that \[\begin{equation} h(t) = -\frac{S'(t)}{S(t)} = \frac {f(t)} {S(t)}. \end{equation}\]

2.1 Weibull model fundamentals

In a Weibull model, the time-to-event \(T\) follows a Weibull distribution with shape \(k\) and scale parameter \(\sigma\). Its PDF is \(f(x) = \sigma k x^{k - 1} e^{-\sigma x^k}\) and the corresponding hazard function is \[\begin{equation} \tag{2.2} h(t) = \lambda k t^{k-1} \end{equation}\] where \(\lambda = \sigma^{-k} > 0\) scales the hazard and \(k > 0\) controls its shape.

Given (2.2) and (2.1), the Weibull survival function is \[\begin{equation} \tag{2.3} S(t) = e^{-\lambda t^{k}} \end{equation}\]

A unique property of the Weibull model is that it is both 1) a proportional hazards model (because \(\lambda\) has a multiplicative effect on \(h(t)\)). 2) an accelerated failure time model (because \(k\) makes \(h(t)\) a function of time).

In a regression setting we typically make \(\lambda\) a function of covariates \[\begin{equation} \lambda_i = \exp \big(\beta_0 + \beta_1 x_{1i} + \cdots + \beta_n x_{ni} \big) \end{equation}\] where \(x_\cdot\) are covariates, \(\beta_\cdot\) are regression coefficients, \(i\) is an index for a given observation (omitted above), and the exponential function ensures that \(\lambda\) remains positive.

The Weibull distribution is often parameterised via it’s shape (called \(k\) here) and one of the alternative scale parameters

  1. the scale of the hazard, called \(\lambda\) in NIMBLE and here, and called \(b\) on Wikipedia.

  2. the scale of the Weibull distribution, called \(\lambda\) on Wikipedia and scale in R and NIMBLE. Here, I will use \(\sigma\) to denote the scale of the Weibull.

Do not confuse these alternative scale parameters, which are inversely related via \[\begin{equation} \sigma = \lambda^{-1/k} \end{equation}\]

3 survival::survreg

To kick things off, here’s a quick look at the survreg() function from the survival package.

# Exponential model = Weibull with k = 1: the two fits are the same
m1 <- survreg(Surv(futime, fustat) ~ ecog.ps + rx, data=ovarian, dist='weibull', scale=1)
m2 <- survreg(Surv(futime, fustat) ~ ecog.ps + rx, data=ovarian, dist="exponential")
# Weibull model
m3 <- survreg(Surv(futime, fustat) ~ ecog.ps + rx, data=ovarian, dist='weibull')
AIC(m1)
## [1] 200.3961
AIC(m2)
## [1] 200.3961
AIC(m3)
## [1] 202.169
# A model with different baseline survival shapes for two groups,
# i.e., two different scale parameters
m4 <- survreg(Surv(time, status) ~ ph.ecog + age + strata(sex), data=lung)

3.1 Potential confusion with alternative parameterisations in survreg, Wikipedia and NIMBLE

The survreg() function uses a different parameterisation and an apparently a non-standard definition of “shape” and “scale” parameters. This is a recipe for confusion. The last example in ?survreg gives the following “guidance”

 # There are multiple ways to parameterize a Weibull distribution. The survreg
 # function embeds it in a general location-scale family, which is a
 # different parameterization than the rweibull function, and often leads
 # to confusion.
 #   survreg's scale  =    1/(rweibull shape)
 #   survreg's intercept = log(rweibull scale)
 #   For the log-likelihood all parameterizations lead to the same value.

In NIMBLE, the default 2nd parameter of dweib is lambda (i.e. scale of hazard). In R’s stats::dweibull() the second parameter is scale. In a proportional hazards model it is natural to work with \(\lambda\) – we just need to take care when specifying “shape” in dweibull et al. See page two of https://r-nimble.org/cheatsheets/NimbleCheatSheet.pdf

Further risk of confusion arises when comparing with Wikipedia, because the \(\lambda\) on Wikipedia corresponds to the scale parameter in dweib (NIMBLE) and stats::dweibull() (R) – we avoid that use here. The lambda in NIMBLE is called \(b\) on Wikipedia, and features as the first alternative parameterisation.

At the pressent time, I do not understand the justification of survreg’s parameterisation (hopefully someone can explain this to me one day). For the rest of the current document I will preferentially parameterise in terms of scale of the hazard function, i.e. \(\lambda\) in JAGS and NIMBLE.

The following example is intended to help navigate the confusing differences between the parameter names of rweibull and survreg.

# Generate data y with rweibull
Shape  <- 2                  # k
Lambda <- 0.04               # scale of hazard
Scale  <- Lambda^(-1/Shape)  # sigma, scale of Weibull distribution
n     <- 1e3
y     <- rweibull(n=n, shape=Shape, scale=Scale)

# Obtain Kaplan-Meier curve via survfit
km <- survfit(Surv(y)~1)

# Compare Kaplan-Meier to the survival function S(t) = 1 - F(t)
# i.e. the complementary distribution function
plot(km, lwd = 3, ylab = "Survival", xlab = "Time")
curve(1-pweibull(x, Shape, Scale), 0, 15, n=101, add=TRUE, col=2, lwd=3, lty=1)

# Fit Weibull model via survfit & add to plot
m5 <- survreg(Surv(y) ~ 1, dist = "weibull")
curve(
  1 - pweibull(x, shape=1/m5$scale, scale=exp(m5$coefficients["(Intercept)"])),
  0, 15, n=101, add=TRUE, col=3, lwd=3
)
legend("topright", col=1:3, lwd=3, cex=0.8,
       legend=c("Kaplan-Meier", "1-pweibull (original model)", "survreg (fitted model)"))
Comparison of 1-pweibull, Kaplan-Meier and a survreg model.

Figure 3.1: Comparison of 1-pweibull, Kaplan-Meier and a survreg model.

4 A simple Weibull model in NIMBLE

# BUGS code for NIMBLE model
bugsCode <- nimbleCode ({
  shape  ~ dhalfflat()
  lambda ~ dhalfflat()
  for (ii in 1:nObs) {
    y[ii] ~ dweib(shape = shape, lambda = lambda)
  }
})

# Build NIMBLE model
rMod <- nimbleModel(
  bugsCode,
  const = list(nObs = n),
  inits = list(shape  = Shape, lambda = Lambda),
  data = list(y = y)
)

The first thing we might want to do with this NIMBLE model is simulate some data, just as a check that the model set up correctly and that it is capable of replicating the original data y.

# Simulate some data with the NIMBLE model
simulate(rMod, "y", includeData = TRUE)

# Visual comparison of original data y and replicated data: summary table
rbind(c(summary(y), Var.=var(y)), c(summary(rMod$y), Var.=var(rMod$y)))
##           Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     Var.
## [1,] 0.2198114 2.547836 4.053675 4.373522 5.908898 12.67882 5.576799
## [2,] 0.1745131 2.533269 4.066641 4.307828 5.826038 12.83857 5.361133
# Visual comparison of original data y and replicated data: histogram
maxX <- ceiling(max(c(y, rMod[["y"]])))
hist(y,           breaks=seq(0, maxX, 0.5), col=(col1<-rgb(1,0,0,0.5)))
hist(rMod[["y"]], breaks=seq(0, maxX, 0.5), col=(col2<-rgb(0,0,1,0.5)), add=TRUE)
legend("topright", col=c(col1, col2), lwd=5, cex=0.8,
       legend=c("Original model", "NIMBLE model"))
Comparison of data generated by rweibull and NIMBLE model. Overlap of the two histograms is shown as deep purple.

Figure 4.1: Comparison of data generated by rweibull and NIMBLE model. Overlap of the two histograms is shown as deep purple.

# Visual comparison of original data y and replicated data: Q-Q plot
plot(quantile(y, probs = seq(0.01, 0.99, 0.01)),
     quantile(rMod[["y"]], probs = seq(0.01, 0.99, 0.01)),
     xlab = "Quantiles of data generated by rweibull",
     ylab = "Quantiles of data generated by NIMBLE model",
     main = "Q-Q plot: rweibull vs. NIMBLE model",
     pch = 19, col = "blue")
abline(0, 1)
Q-Q plot comparison of data generated by rweibull and NIMBLE model.

Figure 4.2: Q-Q plot comparison of data generated by rweibull and NIMBLE model.

4.1 MCMC with simple Weibull model

Now we can repopulate the NIMBLE model with the original data (actually, this is not really required here, but in a real analysis this could be critically important!!), in order to do some Bayesian inference via MCMC.

rMod$y <- y                    # Replaces replicated data with original in NIMBLE model
cMod   <- compileNimble(rMod)  # Compile model to C++

We first configure (i.e define and set-up) an MCMC algorithm.

conf  <- configureMCMC(cMod)  # Configure an MCMC
## ===== Monitors =====
## thin = 1: lambda, shape
## ===== Samplers =====
## RW sampler (2)
##   - shape
##   - lambda

The printed output is telling us a few things

  1. thinning is set to 1.
  2. we will monitor (i.e. obtain output for) two parameters, hazard scale \(\lambda\) and shape \(k\).
  3. there are two random-walk (i.e. univariate adaptive Metropolis-Hastings) samplers in our MCMC algorithm.

It is possible to modify the default MCMC configuration - for example, in order to improve MCMC performance in some way. Details are available in the NIMBLE users manual. However, for our simple model this default configuration will do the job fine.

Now that the MCMC is defined, we need to build, compile and run it.

mcmcR <- buildMCMC(conf)      # Build the MCMC
mcmcC <- compileNimble(mcmcR) # Compile to C++
mcmcC$run(niter=1e5)         # Run the MCMC

Next, we extract the MCMC output & convert it to an mcmc object using the coda package.

samps <- as.mcmc(as.matrix(mcmcC$mvSamples))

The first thing to do with any MCMC output is to assess if the mixing is acceptable or not. The following command provides a menu-driven set of diagnostics of the MCMC performance. See the coda package for more details.

codamenu()

4.2 Credibility intervals from MCMC output

The most basic analysis is provided by coda’s summary command, which provides summaries of each marginal of the joint posterior distribution.

summary(samps)
## 
## Iterations = 1:1e+05
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##           Mean       SD  Naive SE Time-series SE
## lambda 0.04689 0.004409 1.394e-05      0.0001191
## shape  1.92174 0.048599 1.537e-04      0.0013035
## 
## 2. Quantiles for each variable:
## 
##           2.5%     25%    50%     75%   97.5%
## lambda 0.03885 0.04384 0.0467 0.04976 0.05612
## shape  1.82596 1.88913 1.9215 1.95444 2.01730

We can alternatively generate 95% CIs and median values for each marginal as follows.

medCIlam <- quantile(samps[,"lambda"], probs=c(0.025, 0.5, 0.975)) # 95% CI & median
medCIshp <- quantile(samps[,"shape"],  probs=c(0.025, 0.5, 0.975)) # 95% CI & median

Now we’ll plot the MCMC output and compare with the true values.

# Plot MCMC output & compare to the original parameter values
par(mfrow=n2mfrow(4))
# Lambda
traceplot(samps[,"lambda"], ylab="lambda")
abline(h=Lambda, col="red", lwd=2)        # Original value used to simulate y
abline(h=medCIlam, col="green", lty=c(2,1,2)) # 95% CI & median
densplot(samps[,"lambda"], xlab="lambda")
abline(v=Lambda, col="red", lwd=2)        # Original value used to simulate y
abline(v=medCIlam, col="green", lty=c(2,1,2)) # 95% CI & median
# Shape
traceplot(samps[,"shape"], ylab="shape")
abline(h=Shape, col="red", lwd=2)             # Original value used to simulate y
abline(h=medCIshp, col="green", lty=c(2,1,2)) # 95% CI & median
densplot(samps[,"shape"], xlab="shape")
abline(v=Shape, col="red", lwd=2)             # Original value used to simulate y
abline(v=medCIshp, col="green", lty=c(2,1,2)) # 95% CI & median
True vs. fitted (posterior samples) for parameters for Weibull survival model.

Figure 4.3: True vs. fitted (posterior samples) for parameters for Weibull survival model.

Next we calculate 95% credibility intervals for a survival curve and a Q-Q plot.

predTimes <- c(0, km$time)            # For survival curve
nTimes    <- length(predTimes)
surv      <- array(NA, dim=c(nrow(samps), nTimes))
pSeq      <- seq(0.01, 0.99, by=0.01) # For Q-Q plot
quant     <- array(NA, dim=c(nrow(samps), length(pSeq)))
for (ii in 1:nrow(samps)) {
  if (ii %% (nrow(samps)/10) == 0)
    nimPrint(ii, "/", nrow(samps))   # Print progress
  surv[ii,]  <- 1-pweibull(predTimes, shape=samps[ii,"shape"], scale=(samps[ii,"lambda"])^(-1/samps[ii,"shape"]))
  quant[ii,] <-  qweibull(pSeq,      shape=samps[ii,"shape"], scale=(samps[ii,"lambda"])^(-1/samps[ii,"shape"]))
}
# Calculate quantiles for survival curve and Q-Q plot
survMedianCI95  <- apply(surv,  2, quantile, probs = c(0.025, 0.5, 0.975))
quantMedianCI95 <- apply(quant, 2, quantile, probs = c(0.025, 0.5, 0.975))

The following plot compares the Kaplan-Meier curve to the survival curve of the Bayesian Weibull model.

par(mfrow=c(1,1))
plot(km, lwd=3, ylab="Survival", xlab="Time")
polygon(
  x = c(predTimes, rev(predTimes)),
  y = c(survMedianCI95[1, ], rev(survMedianCI95[3, ])),
  col = "lightblue", # 0.50
  border = NA
)
curve(1-pweibull(x, Shape, Scale), 0, 15, n=101, add=TRUE, col="red", lwd=3, lty=1)
lines(predTimes, survMedianCI95[2, ], col="blue", lwd=3, lty=2)
legend("topright", col=c("black", "blue", "lightblue", "red"), lwd=3, cex=0.8,
       legend = c("Kaplan-Meier", "Posterior Median", "Bayesian 95% CI", "Original model (target)"))
Survival curves from the original Weibull model (red), Bayesian model (blue) and Kaplan-Meier (black)

Figure 4.4: Survival curves from the original Weibull model (red), Bayesian model (blue) and Kaplan-Meier (black)

And now for a Q-Q plot.

quantY <- quantile(y, pSeq)
plot(quantY, quantMedianCI95[2,], typ="l", xlab="Quantile of observed data", ylab="Fitted quantile")
polygon(
  x = c(quantY, rev(quantY)),
  y = c(quantMedianCI95[1, ], rev(quantMedianCI95[3, ])),
  col = "red", # 0.50
  border = NA
)
abline(0,1, lwd=3)
lines(quantY, quantMedianCI95[2,], typ="l", col="green", lwd=3)
legend("topleft", legend=c("95% CI","Posterior Median","1-1 line"),
       col=c("red","green","black"), cex=0.8, lwd=3)
Q-Q plot for the Bayesian Weibull model.

Figure 4.5: Q-Q plot for the Bayesian Weibull model.

5 Bayesian variable selection

Bayesian variable selection is done through shrinkage priors. These priors will shrink a parameter close to zero, unless the data contains enough information to justify a non-zero value. We explore two approaches here, (1) Dirichlet-Laplace priors (Bhattacharya et al. 2015), and (2) horseshoe priors (Carvalho, Polson, and Scott 2009).

5.1 Approach 1: Dirichlet-Laplace prior

The following adaptation of the above NIMBLE model is based on equation 10 of Bhattacharya et al. (2015).

# BUGS code for Weibull model with Dirichlet-Laplace prior on beta1
dlCode <- nimbleCode ({
  shape ~ dhalfflat()
  beta0 ~ dnorm(0, sd=1E6)
  beta1 ~ ddexp(location=0, scale=phi1)
  phi1  ~ dgamma(shape=1/2, rate=1/2)
  for (ii in 1:nObs) {
    y[ii]       ~ dweib(shape=shape, lambda=lambda[ii])
    lambda[ii] <- exp(beta0 + beta1 * x1[ii])
  }
})
# Build the NIMBLE model
dlR <- nimbleModel(
  dlCode,
  const = list(nObs = n),
  inits = list(shape = Shape, beta0 = log(Lambda), beta1 = 0, phi1 = 1),
  data  = list(y = y, x1 = rnorm(n = n))
)

Note, here we follow (Bhattacharya et al. 2015) and use shape = 1/2. However, (Bhattacharya et al. 2015) also discuss using shape = 1/n, where n is the number of parameters. I.e., the more potential coefficients you add, the stricter the penalty for non-sparseness. They do however suggest that shape = 1/n does favour sparse models.

Before running an MCMC, it is a good idea to check that each stochastic node has a finite log-likelihood (logProb in NIMBLE jargon). If any of the following commands returns -Inf, then there is a problem with the initial values provided to the model. Such problems need resolving prior to MCMC.

calculate(dlR, "shape")
## [1] 0
calculate(dlR, "beta0")
## [1] -14.73445
calculate(dlR, "beta1")
## [1] -0.6931472
calculate(dlR, "phi1")
## [1] -1.418939
calculate(dlR, "y")
## [1] -2222.017
calculate(dlR)
## [1] -2238.863

5.1.1 MCMC for Weibull model with Dirichlet-Laplace prior

We are now ready to compile the model, and set up and compile an MCMC algorithm.

dlC      <- compileNimble(dlR)
dlConf   <- configureMCMC(
  dlC,
  # Not sure why beta1 was being left out of default monitor list
  monitors = dlC$getNodeNames(includeData = FALSE, stochOnly = TRUE)
)
dlMcmcR  <- buildMCMC(dlConf)
dlMcmcC  <- compileNimble(dlMcmcR)
dlMcmcC$run(niter = 1e5)

We can now use the coda library to extract & summarise the MCMC samples as follows.

dlSamps <- as.mcmc(as.matrix(dlMcmcC$mvSamples))
summary(dlSamps) # Notice how the 95% CI on beta1 includes zero
## 
## Iterations = 1:1e+05
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##          Mean      SD  Naive SE Time-series SE
## beta0 -3.0737 0.09190 2.906e-04      0.0024618
## beta1 -0.0228 0.02801 8.859e-05      0.0004233
## phi1   0.1807 0.35139 1.111e-03      0.0146109
## shape  1.9262 0.04773 1.509e-04      0.0012785
## 
## 2. Quantiles for each variable:
## 
##             2.5%      25%      50%        75%    97.5%
## beta0 -3.2537475 -3.13667 -3.07278 -3.0125994 -2.89145
## beta1 -0.0862627 -0.04106 -0.01677 -0.0004088  0.01922
## phi1   0.0006223  0.01372  0.05289  0.1766017  1.19680
## shape  1.8311299  1.89443  1.92587  1.9587044  2.01867

Again, MCMC diagnostics should be performed to assess mixing. The following command provides a menu-driven set of tools for this purpose.

codamenu()

5.1.2 Some basic analysis of MCMC output for Weibull-Dirichlet-Laplace model

And we can also use coda to plot the MCMC output.

plot(dlSamps)
MCMC output for Weibull model with Dirichlet-Laplace prior on beta1.

Figure 5.1: MCMC output for Weibull model with Dirichlet-Laplace prior on beta1.

Notice how the 95% CI on beta1 is centred at zero.

We may also like to compare parameters from the original data-generation model with the MCMC-approximated marginal posterior distributions of key model parameters.

par(mfrow = n2mfrow(6))
# Intercept
ci95 <- quantile(dlSamps[,"beta0"], c(0.025, 0.975))
traceplot(dlSamps[,"beta0"], ylab="Intercept")
abline(h=log(Lambda), col="red", lwd=2)
abline(h=ci95, col="green", lwd=2)
densplot(dlSamps[,"beta0"], xlab="Intercept")
abline(v=log(Lambda), col="red", lwd=2)
abline(v=ci95, col="green", lwd=2)
# Shape
ci95 <- quantile(dlSamps[,"shape"], c(0.025, 0.975))
traceplot(dlSamps[,"shape"], ylab="shape")
abline(h=Shape, col="red", lwd=2)
abline(h=ci95, col="green", lwd=2)
densplot(dlSamps[,"shape"], xlab="shape")
abline(v=Shape, col="red", lwd=2)
abline(v=ci95, col="green", lwd=2)
# Beta1
ci95 <- quantile(dlSamps[,"beta1"], c(0.025, 0.975))
traceplot(dlSamps[,"beta1"], ylab="beta1")
abline(h=0, col="red", lwd=2)
abline(h=ci95, col="green", lwd=2)
densplot(dlSamps[,"beta1"], xlab="beta1")
abline(v=0, col="red", lwd=2)
abline(v=ci95, col="green", lwd=2)
True values (red) vs. fitted (posterior samples) for parameters of the Weibull model.

Figure 5.2: True values (red) vs. fitted (posterior samples) for parameters of the Weibull model.

As above, we calculate the 95% credibility intervals for the survival curve, and also for a Q-Q plot.

surv  <- array(NA, dim=c(nrow(dlSamps), nTimes))
quant <- array(NA, dim=c(nrow(dlSamps), length(pSeq)))
for (ii in 1:nrow(dlSamps)) {
  if (ii %% (nrow(dlSamps)/20) == 0)
    nimPrint(ii, "/", nrow(dlSamps))
  lambda <- exp(dlSamps[ii,"beta0"] + dlSamps[ii,"beta1"]*mean(dlC$x1))
  shape  <- dlSamps[ii,"shape"]
  surv[ii,]  <- 1-pweibull(predTimes, shape=dlSamps[ii,"shape"], scale=lambda^(-1/shape))
  quant[ii,] <- qweibull(pSeq, shape=dlSamps[ii,"shape"], scale=lambda^(-1/shape))
}
survMedianCI95  <- apply(surv,  2, quantile, probs = c(0.025, 0.5, 0.975))
quantMedianCI95 <- apply(quant, 2, quantile, probs = c(0.025, 0.5, 0.975))

The following code creates a plot comparing the Kaplan-Meier curve to the survival curve of the Bayesian Weibull model.

par(mfrow=c(1,1))
plot(km, lwd=3, ylab="Survival", xlab="Time")
polygon(
  x = c(predTimes, rev(predTimes)),
  y = c(survMedianCI95[1, ], rev(survMedianCI95[3, ])),
  col = "lightblue",
  border = NA
)
curve(1-pweibull(x, Shape, Scale), 0, 15, n=101, add=TRUE, col="red", lwd=3, lty=1)
lines(predTimes, survMedianCI95[2, ], col="blue", lwd=3, lty=2)
legend("topright", col=c("black", "blue", "lightblue", "red"), lwd=3, cex=0.8,
       legend = c("Kaplan-Meier", "Posterior Median", "Bayesian 95% CI", "Original model (target)"))
Survival curve from Bayesian Weibull model with Dirichlet-Laplace prior on beta1.

Figure 5.3: Survival curve from Bayesian Weibull model with Dirichlet-Laplace prior on beta1.

And now a Q-Q plot.

quantY <- quantile(y, pSeq)
plot(quantY, quantMedianCI95[2,], typ="l", xlab="Quantile of observed data", ylab="Fitted quantile")
polygon(
  x = c(quantY, rev(quantY)),
  y = c(quantMedianCI95[1, ], rev(quantMedianCI95[3, ])),
  col = "red", # 0.50
  border = NA
)
abline(0,1, lwd=3)
lines(quantY, quantMedianCI95[2,], typ="l", col="green", lwd=3)
legend("topleft", legend=c("95% CI","Posterior Median","1-1 line"),
       col=c("red","green","black"), lwd=3, cex=0.8)
Q-Q plot for Bayesian Weibull model with Dirichlet-Laplace prior on beta1.

Figure 5.4: Q-Q plot for Bayesian Weibull model with Dirichlet-Laplace prior on beta1.

5.2 Approach 2: Horseshoe prior

This approach is based on equations 2.2 and 3.1 of Piironen and Vehtari (2017), the later equation is cited as coming from Gelman (2006).

hsCode <- nimbleCode ({
  shape   ~ dhalfflat()
  beta0   ~ dnorm(0,sd=1E6)
  beta1   ~ dnorm(mean=0, sd=tau*lambda1)      # Note: Piironen eq 2.2 uses var not sd.
  lambda1 ~ T(dt(mu=0, sigma=1, df=1), 0, Inf) # Truncated Cauchy(0,1)
  # Truncated Cauchy(0,1), i.e. the Gelman 2006 approach.
  tau     ~ T(dt(mu=0, sigma=1, df=1), 0, Inf)
  for (ii in 1:nObs) {
    y[ii]       ~ dweib(shape=shape, lambda=lambda[ii])
    lambda[ii] <- exp(beta0 + beta1 * x1[ii])
  }
})

# Build NIMBLE model
hsR    <- nimbleModel(
  hsCode,
  const = list(nObs = n),
  inits = list(shape = Shape, beta0 = log(Lambda), beta1 = 0, tau = 1, lambda1 = 1),
  data = list(y = y, x1 = rnorm(n = n))
)
## Defining model
## Building model
## Setting data and initial values
## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.
## Checking model sizes and dimensions
# Check each stochastic node has a log-likelihood ("logProb" in NIMBLE jargon)
calculate(hsR, "shape")
## [1] 0
calculate(hsR, "beta0")
## [1] -14.73445
calculate(hsR, "beta1")
## [1] -0.9189385
calculate(hsR, "tau")
## [1] -1.14473
calculate(hsR, "lambda1")
## [1] -1.14473
calculate(hsR, "y")
## [1] -2222.017
calculate(hsR)
## [1] -2239.959

Now we can compile the model, build and compile an MCMC, and run the MCMC.

hsC     <- compileNimble(hsR)
hsConf  <- configureMCMC(
  hsC,
  # Not sure why beta1 is being left out of default monitor list
  monitors = hsC$getNodeNames(includeData = FALSE, stochOnly = TRUE)
)
hsMcmcR <- buildMCMC(hsConf)
hsMcmcC <- compileNimble(hsMcmcR)
hsMcmcC$run(niter = 1e5)

As above, the MCMC output can be summarised, using coda, as follows

hsSamps <- as.mcmc(as.matrix(hsMcmcC$mvSamples))
summary(hsSamps) # Notice how the 95% CI on beta1 includes zero
## 
## Iterations = 1:1e+05
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 1e+05 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##             Mean      SD  Naive SE Time-series SE
## beta0   -3.07095 0.09154 2.895e-04      0.0024331
## beta1    0.01323 0.02373 7.505e-05      0.0002255
## lambda1  0.52709 0.73855 2.336e-03      0.0183397
## shape    1.92499 0.04755 1.504e-04      0.0012590
## tau      0.59100 0.88648 2.803e-03      0.0271370
## 
## 2. Quantiles for each variable:
## 
##              2.5%       25%       50%      75%    97.5%
## beta0   -3.248603 -3.133924 -3.071359 -3.00691 -2.89507
## beta1   -0.029673 -0.001304  0.009052  0.02781  0.06636
## lambda1  0.005052  0.085645  0.275213  0.66601  2.59312
## shape    1.832889  1.891889  1.925005  1.95793  2.01712
## tau      0.006557  0.089657  0.281380  0.70489  3.24450

Again, ideally you would run command (or something similar).

codamenu()

First, let’s look at the key parameters, and how the Bayesian estimates compare to the true values.

par(mfrow=n2mfrow(6))
# Intercept
ci95 <- quantile(hsSamps[,"beta0"], c(0.025, 0.975))
traceplot(hsSamps[,"beta0"], ylab="Base-line hazard")
abline(h=log(Lambda), col="red", lwd=2)
abline(h=ci95, col="green", lwd=2)
densplot(hsSamps[,"beta0"], xlab="Base-line")
abline(v=log(Lambda), col="red", lwd=2)
abline(v=ci95, col="green", lwd=2)
# Shape
ci95 <- quantile(hsSamps[,"shape"], c(0.025, 0.975))
traceplot(hsSamps[,"shape"], ylab="shape")
abline(h=Shape, col="red", lwd=2)
abline(h=ci95, col="green", lwd=2)
densplot(hsSamps[,"shape"], xlab="shape")
abline(v=Shape, col="red", lwd=2)
abline(v=ci95, col="green", lwd=2)
# Beta1
ci95 <- quantile(hsSamps[,"beta1"], c(0.025, 0.975))
traceplot(hsSamps[,"beta1"], ylab="beta1")
abline(h=0, col="red", lwd=2)
abline(h=ci95, col="green", lwd=2)
densplot(hsSamps[,"beta1"], xlab="beta1")
abline(v=0, col="red", lwd=2)
abline(v=ci95, col="green", lwd=2)
Fitted vs. true values for key parameters.

Figure 5.5: Fitted vs. true values for key parameters.

Notice how the mode in the posterior of beta1 is close to zero, the variance is also small, although slightly larger than with the Dirichlet-Laplace prior.

We can also check the MCMC output of the horseshoe parameters lambda1 and tau.

plot(hsSamps[,c("lambda1","tau")])
MCMC output on beta1 given it's horseshoe prior.

Figure 5.6: MCMC output on beta1 given it’s horseshoe prior.

Notice how the modes of the posteriors of both parameters are at zero.

Again, we calculate the 95% credibility intervals for the survival curve, and also for a Q-Q plot.

surv  <- array(NA, dim=c(nrow(hsSamps), nTimes))
quant <- array(NA, dim=c(nrow(hsSamps), length(pSeq)))
for (ii in 1:nrow(hsSamps)) {
  if (ii %% (nrow(hsSamps)/20) == 0)
    nimPrint(ii, "/", nrow(hsSamps)) # Print progress
  lambda <- exp(hsSamps[ii,"beta0"] + hsSamps[ii,"beta1"]*mean(hsC$x1))
  shape  <- hsSamps[ii,"shape"]
  surv[ii,]  <- 1-pweibull(predTimes, shape=shape, scale=lambda^(-1/shape))
  quant[ii,] <-  qweibull(pSeq,      shape=shape, scale=lambda^(-1/shape))
}
survMedianCI95  <- apply(surv,  2, quantile, probs = c(0.025, 0.5, 0.975))
quantMedianCI95 <- apply(quant, 2, quantile, probs = c(0.025, 0.5, 0.975))

The following plot compares the Kaplan-Meier curve to the survival curve of the Bayesian Weibull model with horseshoe prior.

par(mfrow = c(1, 1))
plot(km, lwd = 3, ylab = "Survival", xlab = "Time")
polygon(
  x = c(predTimes, rev(predTimes)),
  y = c(survMedianCI95[1, ], rev(survMedianCI95[3, ])),
  col = "lightblue",
  border = NA
)
curve(1 - pweibull(x, Shape, Scale), 0, 15, n = 101, add = TRUE, col = "red", lwd =
        3, lty = 1)
lines(predTimes, survMedianCI95[2,], col = "blue", lwd = 3, lty = 2)
legend(
  "topright", col = c("black", "blue", "lightblue", "red"), lwd = 3, cex =
    0.8,
  legend = c("Kaplan-Meier", "Posterior Median", "Bayesian 95% CI", "Original model (target)")
)
Survival curve from Bayesian Weibull model with horseshoe prior on beta1.

Figure 5.7: Survival curve from Bayesian Weibull model with horseshoe prior on beta1.

And now for a Q-Q plot.

quantY <- quantile(y, pSeq)
plot(quantY, quantMedianCI95[2,], typ="l", xlab="Quantile of observed data", ylab="Fitted quantile")
polygon(
  x = c(quantY, rev(quantY)),
  y = c(quantMedianCI95[1, ], rev(quantMedianCI95[3, ])),
  col = "red", # 0.50
  border = NA
)
abline(0,1, lwd=3)
lines(quantY, quantMedianCI95[2,], typ="l", col="green", lwd=3)
legend("topleft", col=c("red","green","black"), lwd=3, cex=0.8,
       legend=c("95% CI","Posterior Median","1-1 line"))
Q-Q plot for Bayesian Weibull model with horseshoe prior on beta1.

Figure 5.8: Q-Q plot for Bayesian Weibull model with horseshoe prior on beta1.

References

Bhattacharya, Anirban, Debdeep Pati, Natesh S Pillai, and David B Dunson. 2015. “Dirichlet–Laplace Priors for Optimal Shrinkage.” Journal of the American Statistical Association 110 (512): 1479–90.

Carvalho, Carlos M, Nicholas G Polson, and James G Scott. 2009. “Handling Sparsity via the Horseshoe.” In Artificial Intelligence and Statistics, 73–80. PMLR.

Gelman, Andrew. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34.

Piironen, Juho, and Aki Vehtari. 2017. “Sparsity Information and Regularization in the Horseshoe and Other Shrinkage Priors.” Electronic Journal of Statistics 11 (2): 5018–51.