Learn BaSTA in R by Owen Jones, Maren Rebke and Fernando Colchero


Introduction

In most cases reesarchers will want to go beyond simply estimating functional forms of mortality and comparing their model fits. They will want to study the effects of individual covariates on the mortality trajectories. BaSTA can currently handle only covariates that do not vary through time. For example, one could fit a model with sex (for most species), but not body mass, since this varies through time. Nevertheless, BaSTA allows fitting of both continuous and categorical covariates in various ways and is a powerful tool for investigating a range of questions.

Formatting covariate data.

Before proceeding further, it is first necessary to introduce the format required for covariates. Briefly, the covariates must be included in the basta input data as a series of columns (with each row representing an individual) appended after the recapture matrix. It is very important that the order of individuals in both parts of the data are the same for obvious reasons! BaSTA includes a handy function for doing this, MakeCovMat().

In the following example we will include covariate information to the BaSTA input table. To do this, let’s read in the table covData.csv

covdat <- read.csv("data/covData.csv", header = TRUE)

which should look like

head(covdat)
#>   ID Sex IndivMeasure
#> 1  1   m      -0.2278
#> 2  3   m      -0.5347
#> 3  9   m       0.2889
#> 4 10   f       0.1523
#> 5 11   f       0.9765
#> 6 12   m       0.4826

Here, the data frame covdat contains one line per individual with sex information and with a continuous individual level covariate. We will focus on the sex covariate first. To construct a matrix of covariates that BaSTA can handle, it is enough to use the BaSTA function MakeCovMat()

covars <- MakeCovMat("~ Sex - 1", data = covdat)

Alternatively, we could have specified the name or number of the column(s) of the covariate(s) we wanted to include as

covars <- MakeCovMat("Sex", data = covdat)

The advantage of using R formulas is that they simplify incorporating interactions between covariates. The resulting matrix should look like this

head(covars)
#>   ID Sexf Sexm
#> 1  1    0    1
#> 2  3    0    1
#> 3  9    0    1
#> 4 10    1    0
#> 5 11    1    0
#> 6 12    0    1

Now we only need to collate the dat matrix that includes the capture-history information with the covars matrix

datCovs <- data.frame(dat, covars[, -1])

BaSTA analysis with categorical covariates

Now, to run the analysis, we simply use function basta() as we did before. Thus, to run a Gompertz model and save the output in an object gssc you can type

gssc <- basta(datCovs, studyStart = 1970, studyEnd = 2000, model = "LO",
              nsim = 2, ncpus = 2, parallel = TRUE)

which produces the following summary

summary(gssc)
#> 
#> Output from BaSTA version 1.9.4
#> 
#> Call:
#> Model                    : LO
#> Shape                    : simple
#> Covars. structure        : fused
#> Minimum age              : 0
#> Cat. covars.             : Sexf, Sexm
#> Cont. covars.            :   
#> 
#> Model settings:
#>    niter   burnin thinning     nsim 
#>    11000     1001       20        2 
#> 
#> Jumps and priors:
#>         Jump.sds Prior.means Prior.sds
#> b0.Sexf  0.32147      -3e+00         1
#> b0.Sexm  0.48692      -3e+00         1
#> b1.Sexf  0.02391       1e-02         1
#> b1.Sexm  0.07422       1e-02         1
#> b2.Sexf  0.29448       1e-10         1
#> b2.Sexm  0.46422       1e-10         1
#> 
#> Mean Kullback-Leibler
#> discrepancy calibration (KLDC):
#>                 b0     b1    b2
#> Sexm - Sexf 0.9163 0.9995 0.988
#> 
#> Coefficients:
#>         Estimate   StdErr Lower95%CI Upper95%CI SerAutocor UpdateRate
#> b0.Sexf  -4.5481 0.251849   -5.04947    -4.1072     0.7889     0.2551
#> b0.Sexm  -3.9917 0.348608   -4.78912    -3.3925     0.7831     0.2516
#> b1.Sexf   0.2017 0.027708    0.15670     0.2653     0.8813     0.2417
#> b1.Sexm   0.5051 0.090832    0.34579     0.6958     0.8616     0.2642
#> b2.Sexf   0.2530 0.179238    0.01355     0.6603     0.7918     0.2573
#> b2.Sexm   1.0797 0.355270    0.45627     1.7828     0.8071     0.2493
#> pi.1970   0.4998 0.008361    0.48302     0.5152     0.1678     1.0000
#>         PotScaleReduc
#> b0.Sexf        0.9992
#> b0.Sexm        0.9992
#> b1.Sexf        1.0007
#> b1.Sexm        0.9990
#> b2.Sexf        1.0060
#> b2.Sexm        0.9995
#> pi.1970        1.0046
#> 
#> Convergence:
#> Appropriate convergence reached for all parameters.
#> 
#> DIC:
#> 6982

Let’s have a look at the survival and mortality trajectories:

plot(gssc, fancy = TRUE)

plot of chunk unnamed-chunk-11

After running the model, you can verify if the inclusion of the sex covariate improved the fit by comparing the DICs between the multiout object and gssc. Now you can try to run the model for the Weibull and logistic models either individually (create objects called wssc and lssc) or in one go using multibasta() to create a multi-BaSTA object called multisc).

Comparing parameter estimates for categorical covariates

To find out to what extent the parameter estimates between males and females differ, you should look at the Mean Kulback-Leibler discrepancy calibration (KLDC) values printed in from the summary() function. You can find a detailed explanation of the KLDC values below, but in short for a given parameter the KLDC tell us how much overlap there is between the posterior densities of the female and male estimate of the parameter. For instance, print run a summary of gssc, you will see that the KLDCs for b0 and b1 between males and females are close to 1, which implies that there is almost no overlap between the posterior densities. A value close to 0.5 would imply that there is a lot of overlap.

Comparing parameter densities: Kullback-Leibler discrepancies

Within BaSTA, the Kullback-Leibler discrepancies (KLD; Kullback and Liebler 1951, McCulloch 1989) provide the user with a measure of how differently (or similarly) each categorical covariate affects survival. For instance, we may wish to evaluate the differences in survival between males and females with a simple Gompertz model. To illustrate the calculation of KLD, lets take b0, for which the resulting `sub-parameters’ would be αf and αm such that, for an individual i, we have b0 = αfIi + αm(1 − Ii), where Ii is an indicator function that assigns 1 if the individual is a female and 0 otherwise. For each of these parameters, BaSTA produces a posterior density, say Pf = p(αf|…) and Pm = p(αm|…), respectively. The KLD between these densities is calculated as


$$ K(P_f, P_m) = \int_{0}^{\infty} P_f \log\left(\frac{P_f}{P_m}\right) d\alpha. $$

The result can be interpreted as how far off we would be if we tried to predict αm from the posterior density of αf. If both densities are identical, then K(Pf, Pm) = 0, which suggests that there is no distinction between males and females for b0. As the KLD values increase the discrepancy becomes larger. As you can inferred from the equation above, the relationship is asymmetric, namely K(Pf, Pm) ≠ K(Pm, Pf).

To make KLD easier to interpret, McCulloch (1989) proposed a simple calibration of the KLD values that reduces the asymmetry. This is as follows: Let k = K(Pf, Pm) and q(k) be a calibration function such that


$$ \begin{eqnarray*} k & = & K(P_f,P_m) \\ &=& K\left\{B\left(\frac{1}{2}\right), B[q(k)]\right\}, \end{eqnarray*} $$

where $B(\frac{1}{2})$ is a Bernouilli distribution for an event with probability 1/2 (i.e. same probability of success and failure). This calibration is then calculated as


$$ q(k) = \frac{1 + (1 - e^{-2k})^{\frac{1}{2}}}{2} $$
.

Thus, q(k) ranges from 0.5 to 1, where a value of 0.5 means that the posterior densities are identical, and 1 that there is no overlap between them.

Changing the structure of the covariates

As you might have notticed, BaSTA by default assigns any categorical covariate as a function of the mortality parameters. We can change this in case we believe that these covariates only affect mortality proportionally, using a proportional hazards model. This is simply done with argument covarsStruct = "prop.haz":

gsscph <- basta(datCovs, studyStart = 1970, studyEnd = 2000, model = "LO",
              covarsStruct = "prop.haz", nsim = 2, ncpus = 2, 
              parallel = TRUE)

We can examine the DIC of the two models to see if using proportional hazards improves the fit.

gsscph$DIC
#>  D.ave D.mode     pD      k    DIC 
#>   5061   2797   2264      5   7325
gssc$DIC
#>  D.ave D.mode     pD      k    DIC 
#>   5033   3084   1949      7   6982

It seems not, so we would favour the model with fused covariate structure.

Adding continuous covariates

Finally, we will run the model adding a continuous covariate. To do this, let’s run function MakeCovMat() and add the new covariate to the formula

covars <- MakeCovMat("~ Sex + IndivMeasure - 1", data = covdat)

Now join the tables:

datCovs <- data.frame(dat, covars[, -1])

and run the analysis. Let’s call the new object for “Gompertz, simple, both covariates:

gsbc <- basta(datCovs, studyStart = 1970, studyEnd = 2000, model = "LO",
              nsim = 2, ncpus = 2, parallel = TRUE)

Now you can try to evaluate which model performs best. What if you test different functional forms with arguments model and shape (you can do this with mulitbasta()).