vignettes/a05_SuggestedQualityControl.Rmd
a05_SuggestedQualityControl.Rmd
The specific requirements and assumptions for each function in
Rage
varies. Here we provide notes giving an overview of
these requirements, which are also applicable to other
functions/calculations in packages such as popbio
and
popdemo
. We make some suggestions of how users should
filter their dataset before analysis. To assist with this, the function
Rcompadre::cdb_flag
conducts a series of checks on the
matrices in a compadreDB
object and adds “flags” to
facilitate the filtering out of problematic matrices. This function
(cdb_flag
) can automatically be run by
Rcompadre::cdb_fetch
using the argument
flag = TRUE
.
The most obvious requirement for most of the Rage
methods is that missing (NA
) values in matrices prevent
calculations using those matrices. Sometimes these NA
values are in one of the submatrices (i.e., U,
F or C) of the matrix model, but other
submatrices are complete. For example, there may be NA
entries in the F submatrix, while the
U matrix remains complete. These issues are flagged
with the columns check_NA_A
, check_NA_U
,
check_NA_F
and check_NA_C
.
Submatrices composed entirely of zero values can also be problematic.
There may be good biological reasons for this phenomenon. Species that
do not reproduce clonally will have zero-value C
matrices. Another biologically reasonable explanation could be that in
the particular focal population in the particular focal year, there was
no sexual reproduction recorded, so the F matrix was
composed entirely of zeros. Nevertheless, zero-value submatrices can
cause some calculations to fail and it may be necessary to exclude them.
These issues are flagged with the columns check_zero_F
,
check_zero_C
, check_zero_U
.
In a biologically reasonable matrix population model, the set of
survival and growth transitions (i.e., in the U matrix)
from a particular stage cannot exceed 1. However, in some cases, errors
in the original matrices (including data entry and rounding errors)
cause this situation to occur and may persist in the data set. We can
check for this error using column sums of the U matrix,
and may wish to exclude matrices with any column sum greater than
1.
This issue is examined using the column SurvivalIssue
which
gives the maximum value of the column sums for matU
. An
additional column check_surv_gte_1
(produced with
cdb_flag
) reports whether any single value
is greater than or equal to 1.
At the opposite end of the survival spectrum, there may be some
matrices where some of the column sums of the U matrix
are zero, implying that there is no survival from that particular stage.
This may be a perfectly valid parameterisation for a particular
year/place but is biologically unreasonable in the longer term and users
may wish to exclude problematic matrices from their analysis. This issue
is indicated by the column check_zero_U_colsum
.
Several matrix manipulations or calculations require that the MPM
(matA
) be irreducible and ergodic (Stott et al. 2018).
Irreducible MPMs are those where parameterised transition rates
facilitate pathways from all stages to all other stages. Conversely,
reducible MPMs depict incomplete life cycles where pathways from all
stages to every other stage are not possible. Ergodic MPMs are those
where there is a single asymptotic stable state that does not
depend on initial stage structure. Conversely, non-ergodic MPMs are
those where there are multiple asymptotic stable states, which depend on
initial stage structure. MPMs that are reducible and/or non-ergodic are
usually biologically unreasonable, both in terms of their life cycle
description and their projected dynamics. They cause some calculations
in Rage
(and elsewhere) to fail. Irreducibility is
necessary but not sufficient for ergodicity. These issues are flagged
with check_irreducible
and check_ergodic
. Even
if Rage
functions do not fail due to these issues, the fact
that they can indicate biologically unreasonable life cycles may mean
that users nevertheless wish to exclude reducible, non-ergodic matrices
from their analyses.
Matrices are said to be singular if they cannot be inverted.
Inversion is required for many matrix calculations and, therefore,
singularity can cause some calculations to fail. This issue is flagged
with check_singular_U
. Calculations for
longevity
, life_expect_mean
,
life_expect_var
and net_repro_rate
fail with
singular matrices, so users may wish to exclude singular matrices when
conducting analyses using these functions.
A complete MPM (A) can be split into its component
submatrices (i.e. U, F and
C). The sum of these submatrices should equal the
complete MPM (i.e. A = U +
F + C). Sometimes, however, errors
occur so that the submatrices do NOT sum to A.
Normally, this is caused by rounding errors, but more significant errors
are possible. This problem is flagged with
check_component_sum
(only relevant for divided (split)
matrices). We recommend that users carefully check their matrices for
these errors and correct or exclude them as appropriate.
It is a general requirement for almost all Rage
functions that the matrices used as arguments do not include
NA
values. With divided (split) matrices, NA
values may be present in some submatrices but not others. For example,
the U matrix may be complete, but the
F matrix may have NA
values. In this case,
functions that require an F matrix will fail, while
those that only require a U matrix will work. Users
should filter the data to exclude entries with NA
values in
the matrices required for their analysis. The functions
mpm_split
, mpm_rearrange
and
mpm_standardise
do not require complete
NA
-free matrices.
For functions that use the U matrix, we further
suggest filtering the data to exclude the biologically unreasonable
entries where one or more of the matU
columns sum to zero,
or to greater than 1 (see Excessive Survival, above).
Alternatively, users could examine the offending matrices and make
sensible corrections (e.g. to correct rounding errors).
For functions that use the F matrix, and where sexual reproduction is known to occur in the species, we suggest that users consider filtering the data to exclude entries where F is entirely zero. This is not always desirable because there are some situations where zero recorded reproduction is biologically reasonable. We suggest a similar approach for the C matrix.
When using age-from-stage methods, users should be aware of the issue
of convergence to quasi-stationary distribution (see ). Briefly, All
age-from-stage calculations produce age-trajectories that inevitably
asymptote as a mathematical consequence of describing the vital rates as
functions of discrete stages (Horvitz & Tuljapurkar, 2008). This
mathematical artefact can introduce bias into measures obtained using
age-from-stage methods. Rage
provides a convenient and
principled way of correcting for this artefact by imposing a lower
probability threshold defined by the degree of convergence to the
quasi-stationary distribution (see ). We suggest that users filter out
from their analyses matrices that do not pass this threshold
criterion.
Users should also be aware of the issue of census type. For
populations that reproduce in a pulse once per year. The demographic
census may be carried out before or after the reproduction event. There
are thus two types of census: Pre- and post-reproductive census. This
distinction has potentially important implications for demographic
measures because of its effects on measured population structure. For
example, the fraction of individuals in the first age class will tend to
be larger larger in a post-reproductive census than a pre-reproductive
census. There is a column in the com(p)adre metadata
(CensusType
) that is intended to record this information
but, because authors of source publications have rarely clearly stated
this information, it is very incomplete. For serious analyses we
therefore recommend that users carefully collect this information
themselves from the source papers.
Although we highlight here a range of issues that could cause problems for MPM analyses we have likely inadvertently omitted some issues. We therefore urge users to carefully consider issues that may pertain to their particular analyses.
Horvitz, C. C., & Tuljapurkar, S. (2008). Stage dynamics, period survival, and mortality plateaus. The American Naturalist, 172(2), 203–215.
Stott, I., Townley, S., & Carslake, D. (2010). On reducibility and ergodicity of population projection matrix models. Methods in Ecology and Evolution. 1 (3), 242-252