Chapter 23 Examples of statistics reporting
In this section I give some examples of how one can describe methods and report results of statistical analyses. These are not “set in stone”. The are certainly other ways to write these up, but these examples will give you an idea to base your own reporting on.
23.1 t-test
This example concerns a t-test where the mean heights of plants after a set amount of growing time was compared across two groups (control and treatment).
The methods could be reported like this:
To asses whether there Was a significant difference in mean height between the two groups I used two-sample t-test where the variances of the two groups were assumed to be equal.
Based on the following t-test result:
> results <- t.test(control, treatment, var.equal = TRUE)
> print(results)
Two Sample t-test
data: control and treatment
t = -1.6184, df = 8, p-value = 0.1431
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.6644772 0.0644772
sample estimates:
mean of x mean of y
7.2 8.2
You might report this as:
My t-test comparing the mean height of plants in the control group and the treatment group showed that the mean height of plants in the control group was 7.2cm while the mean height of plants in the treatment group was 8.2cm. The 95% Confidence Interval of the difference between the mean values was -1.66cm to 0.06cm and it is clear that there is no significant difference between the two groups (t-statistic = -1.6184, d.f. = 8, p-value = 0.1431).
In this example, the important details of the t-test (the t-statistic, the degrees of freedom, and the p-value) are included in parentheses after giving the mean estimates for the control and treatment groups. This allows the reader to see the key results of the t-test while also providing the relevant details.
23.2 Simple linear regression model
In this example, the model focuses on the relationship between sepal length and sepal width in the iris
dataset.
For the methods, I might write something like this:
To investigate the relationship between sepal length and sepal width I fitted a linear regression with sepal length as the response variable and sepal width as the explanatory variable.
The output of anova()
looks like this:
Analysis of Variance Table
Response: Sepal.Length
Df Sum Sq Mean Sq F value Pr(>F)
Sepal.Width 1 63.21 63.206 89.59 <2e-16 ***
Residuals 148 73.81 0.497
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We might use this part of the summary to say something like this:
The linear regression analysis investigating the relationship between sepal length and sepal width explained a signifanct amount of variation in sepal length (F = 89.59, d.f. = 1 and 148, p-value <0.01).
The output of summary()
looks like this:
Call:
lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
Residuals:
Min 1Q Median 3Q Max
-1.5396 -0.4374 -0.0813 0.4304 1.7227
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.55147 0.14392 24.630 < 2e-16 ***
Sepal.Width 0.81633 0.02047 39.858 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7029 on 148 degrees of freedom
Multiple R-squared: 0.643, Adjusted R-squared: 0.642
F-statistic: 1593 on 1 and 148 DF, p-value: < 2.2e-16
You might write this up something like this:
The coefficient for sepal width was 0.816 (\(\pm\) 0.020), indicating that an increase in sepal width of 1 mm is associated with an increase in sepal length of 0.816 mm. This slope was significantly different from zero (t = 39.858, d.f. = 148, p < 0.001). The adjusted R-squared value for the model was 0.642, indicating that sepal width explains approximately 64% of the variance in sepal length. These results suggest that sepal width is a strong predictor of sepal length.
23.3 A Generalised linear model (GLM)
In this fictitious example we are looking at the results of a Poisson GLM that is used to study the number of offspring produced by arctic fox of different weights. We use a Poisson GLM because these are count data (number of babies). The model will help us understand how weight is associated with the reproductive success.
For the methods, I might write something like this:
We fitted a Poisson generalized linear model (GLM) to study the number of offspring produced by a population of arctic fox. The model included the factor weight
as a predictor. We then summarised the fitted model using anova()
and summary()
to produce the ANOVA summary and summary of coefficients respectively.
Here’s what the anova()
output shows:
## Analysis of Deviance Table
##
## Model: poisson, link: log
##
## Response: noffspring
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 99 166.84
## weight 1 44.124 98 122.72 3.082e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The analysis of deviance results showed that maternal weight was strongly associated with the number of offspring (Poisson GLM: Null Deviance = 166.8, Residual Deviance = 122.7, d.f. = 1 and 98, p <0.001). This indicates that there is a significant relationship between the weight of the offspring and the number of offspring produced.
We could also add something about the deviances. In this case, the model has a NULL
deviance of 166.84 and weight
explains 44.124 Deviance. We can therefore calculate a kind of pseudo-R^2 value as 44.124/166.84 = 0.264 = 26.4%.
So we could say something like: The deviances and residual deviance showed that the model explained 26.4% of the variation in number of offspring.
Here’s what the summary()
output shows:
##
## Call:
## glm(formula = noffspring ~ weight, family = poisson, data = fox)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3891 -0.9719 -0.1183 0.5897 2.3426
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.74981 0.31107 -2.410 0.0159 *
## weight 0.63239 0.09502 6.655 2.83e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 166.85 on 99 degrees of freedom
## Residual deviance: 122.72 on 98 degrees of freedom
## AIC: 405.56
##
## Number of Fisher Scoring iterations: 5
From this we can say something like:
The slope of the relationship was 0.63 (on the log scale). The equation of the best fit line was log(nOffspring) = -0.75 + 0.63 × MotherWeight
One could also add the standard error of the slope (0.09502) and intercept (0.31107) into this equation: log(nOffspring) = -0.75 \(\pm\) 0.31 + 0.63 \(\pm\) 0.10 \(\times\) MotherWeight.
So, to put it together:
The analysis of deviance results showed that maternal weight was strongly associated with the number of offspring (Poisson GLM: Null Deviance = 166.8, Residual Deviance = 122.7, d.f. = 1 and 98, p <0.001). This indicates that there is a significant relationship between the weight of the offspring and the number of offspring produced. The deviances and residual deviance showed that the model explained 26.4% of the variation in number of offspring. The coefficients allowed me to calculate the equation of the relationship as log(nOffspring) = -0.75 \(\pm\) 0.31 + 0.63 \(\pm\) 0.10 \(\times\) MotherWeight.
Things get a little more complicated with more complicated models. At some point a table might be preferable. However, for a model with two terms, something like this would work:
Analysis of Deviance Table
Model: binomial, link: logit
Response: success
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 199 935.21
treatment 4 267.001 195 668.21 2.825e-15 ***
gender 1 112.298 194 555.91 0.0001076 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The deviance analysis of our logistic regression model demonstrated significant influences of both treatment and gender on success (Binomial GLM: Null Deviance = 935.21, residual df = 199). The inclusion of treatment accounted for a substantial reduction in deviance (Deviance = 267.001, df = 4, p < 0.0001), and the addition of gender further reduced the deviance (Deviance = 112.298, df = 1, p < 0.0001). This suggests that both habitat characteristics and available food resources play crucial roles in determining the probability of nesting success. The model explained 40.56% of the variance in nest success, with habitat type being the major contributor, accounting for 70.39% of the explained variance.