• BB852
  • 1 Preface
    • 1.1 Data wrangling
    • 1.2 Data visualisation
    • 1.3 Statistics
    • 1.4 Data sources
    • 1.5 Your instructor(s)
    • 1.6 Expectations
    • 1.7 Your feedback
    • 1.8 Assessment
    • 1.9 Acknowledgements
  • 2 Schedule
  • 3 Additional recommended reading
  • 4 An R refresher
    • 4.1 Getting started with R
    • 4.2 Getting help
    • 4.3 R as a fancy calculator
    • 4.4 Objects in R
    • 4.5 Manipulating objects
    • 4.6 Missing values, infinity and “non-numbers”
    • 4.7 Basic information about objects
    • 4.8 Data frames
    • 4.9 Classes in R
    • 4.10 Organising your work
    • 4.11 Inspecting the data
    • 4.12 “Classes” in R
    • 4.13 Tables and summary statistics
    • 4.14 Plotting data
    • 4.15 R Packages
    • 4.16 Exercise: Californian bird diversity
      • 4.16.1 The data
      • 4.16.2 Try the following
  • 5 Tips and tricks
    • 5.1 Appearance
    • 5.2 Shortcuts
    • 5.3 Code style
    • 5.4 The plots pane
    • 5.5 Tables
    • 5.6 Importing data from text files
    • 5.7 Importing data from Excel
    • 5.8 Numbers
  • 6 Paths and projects
    • 6.1 File paths
    • 6.2 File organisation
    • 6.3 Two types of paths
      • 6.3.1 1. Absolute Paths: The Exact Location
      • 6.3.2 2. Relative Paths: Paths That Start from the Project Folder
    • 6.4 R and file structure
    • 6.5 Setting the Working Directory in R
      • 6.5.1 Finding Your Current Working Directory
      • 6.5.2 Changing the Working Directory
    • 6.6 Using Projects in RStudio: A Simpler Way to Set the Working Directory
      • 6.6.1 Steps to Create an R Project
      • 6.6.2 Organizing Files Within Your Project
  • I Data Wrangling
  • 7 Data wrangling with dplyr
    • 7.1 select
    • 7.2 filter
    • 7.3 arrange
    • 7.4 summarise and group_by
    • 7.5 Using pipes, saving data.
    • 7.6 Exercise: Wrangling the Amniote Life History Database
  • 8 Combining data sets
    • 8.1 Using join
    • 8.2 Using pivot_longer
    • 8.3 Exercise: Temperature effects on egg laying dates
  • II Data visualisation
  • 9 Visualising data with ggplot2
    • 9.1 Histograms
    • 9.2 “Facets” - splitting data across panels
    • 9.3 Box plots
    • 9.4 Lines and points
    • 9.5 Scatter plots
    • 9.6 Bar plots
  • 10 Distributions and summarising data
    • 10.1 Relationships in Data: Response and Explanatory Variables
    • 10.2 Populations, Samples, and Bias
      • 10.2.1 Distributions
    • 10.3 Normal distribution
    • 10.4 Comparing normal distributions
    • 10.5 Poisson distribution
    • 10.6 Comparing normal and Poisson distributions
    • 10.7 The law of large numbers
      • 10.7.1 Coin flipping
    • 10.8 Exercise: Virtual dice
  • 11 Pimping your plots
    • 11.1 A basic plot
    • 11.2 Axis limits
    • 11.3 Transforming the axis (log scale)
    • 11.4 Changing the axis tick marks
    • 11.5 Axis labels
    • 11.6 Colours
    • 11.7 Themes
    • 11.8 Moving the legend
    • 11.9 Combining multiple plots
    • 11.10 Saving your plot
    • 11.11 Final word on plots
  • III Statistics
  • 12 Randomisation Tests
    • 12.1 Randomisation test in R
      • 12.1.1 Calculate the observed difference
      • 12.1.2 Null distribution
      • 12.1.3 Testing significance
      • 12.1.4 Testing the hypothesis
      • 12.1.5 Writing it up
    • 12.2 Paired Randomisation Tests
      • 12.2.1 The randomisation test
      • 12.2.2 Null distribution
      • 12.2.3 The formal hypothesis test
    • 12.3 Exercise: Sexual selection in Hercules beetles
  • 13 t-test: Comparing two means
    • 13.1 Some theory
    • 13.2 One sample t-test
    • 13.3 Doing it “by hand” - where does the t-statistic come from?
    • 13.4 Paired t-test
    • 13.5 A paired t-test is a one-sample test.
    • 13.6 Two sample t-test
    • 13.7 t-tests are linear models
    • 13.8 Exercise: Sex differences in fine motor skills
    • 13.9 Exercise: Therapy for anorexia
    • 13.10 Exercise: Compare t-tests with randomisation tests (optional)
  • 14 Assumptions in linear models
    • 14.1 The assumptions
  • 15 ANOVA: Linear models with a single categorical explanatory variable
    • 15.1 One-way ANOVA
    • 15.2 Fitting an ANOVA in R
      • 15.2.1 Where are the differences?
      • 15.2.2 Tukey’s Honestly Significant Difference (HSD)
    • 15.3 ANOVA calculation “by hand”.
    • 15.4 Exercise: Apple tree crop yield
  • 16 Linear regression: models with a single continuous explanatory variable
    • 16.1 Some theory
    • 16.2 Evaluating a hypothesis with a linear regression model
    • 16.3 Assumptions
    • 16.4 Worked example: height-hand width relationship
    • 16.5 Exercise: Chirping crickets
  • 17 ANCOVA: Linear models with categorical and continuous explanatory variables
    • 17.1 The height ~ hand width example.
    • 17.2 Summarising with anova
    • 17.3 The summary of coefficients (summary)
  • 18 n-way ANOVA: Linear models with >1 categorical explanatory variables
    • 18.1 Fitting a two-way ANOVA model
    • 18.2 Summarising the model (anova)
    • 18.3 Summarising the model (summary)
    • 18.4 Exercise: Fish behaviour
  • 19 Evaluating linear models
    • 19.1 R-squared value
    • 19.2 Akaike Information Criterion (AIC)
    • 19.3 Variance partitioning
    • 19.4 Conclusion
  • 20 Generalised linear models
    • 20.1 Count data with Poisson errors.
      • 20.1.1 Example: Number of offspring in foxes.
      • 20.1.2 Example: Cancer clusters
      • 20.1.3 Overdispersion: What It Is and Why It Matters
    • 20.2 Exercise: Maze runner
  • 21 Extending use cases of GLM
    • 21.1 Binomial response data
    • 21.2 Example: NFL field goals
      • 21.2.1 DHARMa
      • 21.2.2 Continuing the analysis
    • 21.3 Example: Sex ratio in turtles
    • 21.4 Example: Smoking
  • 22 GLM families and use cases
    • 22.1 Summary Table of GLM Families
    • 22.2 Common GLM Families
    • 22.3 Quasi-family models
  • 23 Power analysis by simulation
    • 23.1 Type I and II errors and statistical power
    • 23.2 What determines statistical power?
    • 23.3 An example of calculating statistical power.
      • 23.3.1 The simulation
      • 23.3.2 Some questions for you to address:
    • 23.4 Summary
    • 23.5 Extending the simulation (optional, advanced)
    • 23.6 Exercise 1: Snails on the move
    • 23.7 Exercise 2: Mouse lemur strength
  • IV Appendix
  • 24 Examples of statistics reporting
    • 24.1 t-test
    • 24.2 Simple linear regression model
    • 24.3 A Generalised linear model (GLM)
  • 25 An example of a past Written Assignment (2020)
  • 26 Leveraging ChatGPT for R Programming Assistance
    • 26.1 Introduction
    • 26.2 Overview
    • 26.3 What is a Large Language Model (LLM) and how do they work?
    • 26.4 Limitations
      • 26.4.1 Limited knowledge
      • 26.4.2 Hallucination
      • 26.4.3 Numerical ability
    • 26.5 Ethics of using LLMs in education
    • 26.6 Use cases in R
      • 26.6.1 Finding errors.
      • 26.6.2 Explaining code
      • 26.6.3 Interpreting output
      • 26.6.4 Translating code (e.g. from Python/Matlab to R)
      • 26.6.5 Solving modelling problems
      • 26.6.6 Helping with documentation/comments
      • 26.6.7 Finding alternative/better ways
    • 26.7 Some tips
  • V Solutions
  • 27 Exercise Solutions
    • 27.1 Californian bird diversity
    • 27.2 Wrangling the Amniote Life History Database
    • 27.3 Temperature effects on egg laying dates
    • 27.4 Virtual dice
    • 27.5 Sexual selection in Hercules beetles
    • 27.6 Sex differences in fine motor skills
    • 27.7 Therapy for anorexia
    • 27.8 Compare t-tests with randomisation tests
    • 27.9 Apple tree crop yield
    • 27.10 Chirping crickets
    • 27.11 Fish behaviour
    • 27.12 Maze runner
    • 27.13 Snails on the move
    • 27.14 Mouse lemur strength
  • Published with bookdown

BB852 - Data handling, visualisation and statistics

Chapter 3 Additional recommended reading

These can be downloaded via the link on itsLearning.

  • Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72(1), 2–10.

  • Gotelli, N. J., & Ellison, A. M. (2013) Chapter 4, Framing and Testing Hypotheses, in A Primer of Ecological Statistics. Sinauer.

  • Petchey, O., Beckerman, A., & Childs, D. (2009). Shock and Awe by Statistical Software - Why R? Bulletin of the British Ecological Society, 40(4), 55–58.

  • Weissgerber, T. L., Milic, N. M., Winham, S. J., & Garovic, V. D. (2015). Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biology, 13(4), e1002128. doi:[10.1371/journal.pbio.1002128](https://doi.org/10.1371/journal.pbio.1002128)

  • Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23.

The following websites are also useful.

The R graph gallery: https://www.r-graph-gallery.com/

STHDA: http://www.sthda.com/english/wiki/ggplot2-essentials http://www.sthda.com/english/wiki/r-basics-quick-and-easy