r-bootcamp
r-bootcamp copied to clipboard
R Bootcamp Materials
Description
The R Bootcamp (also known as “Data Wrangling in R”) provides an overview of the fundamental concepts necessary to work within the R statistical programming language. Through a series of annotated, hands-on lessons that include practice exercises, students will learn to understand the basic principles of the R language including:
- installing R and conducting basic operations in RStudio
- understanding R objects, assignment, and data types (e.g., numeric, character, logical)
- working with data structures (e.g., vectors, factors, data frames)
- using functions and packages to read and write data
- manipulating data
- basic and intermediate plotting
No prior programming experience is necessary to benefit from this course.
Note: For the Python version of this course, visit the python-bootcamp.
Table of Contents
Unit 1 (Script/HTML, Exercise, Solution):
- Installing R and RStudio
- The RStudio interface
- Arithmetic operators and mathematical functions
- Using functions and getting help
Unit 2 (Script/HTML, Exercise, Solution):
- Relational operators
- R data types
- R objects and assignment operators
Unit 3 (Script/HTML, Exercise, Solution):
- R data structures
- R date and date-time classes
Unit 4 (Script/HTML, Data, Exercise, Solution):
- R packages
- Understanding file paths
- The working directory
- Reading data files
- Writing data files
- Other ways to access data
Unit 5 (Script/HTML):
- Descriptives
- Subsetting
- Merging
- Logical operators
- Data manipulation
- Data restructuring
Unit 6 (Script/HTML, Example/HTML):
- Basic (base) plotting
- Saving plots to disk
- Plotting with ggplot2
- Plotting with lattice
Misc (Script/HTML):
- Monty Hall: A Simulation-Based Example of How to Write a Function
To-Do (Work-in-Progress):
- Add more (i.e., beyond Misc) on control-flow constructs (i.e., ?Control; if/else, for/while/repeat loops, etc.)
- Debugging
- Python integration via reticulate
- Add basic string search/regular expressions
- Flesh out dplyr discussion and explain chaining more clearly (more generally, spend more time on tidyverse?)
- Discuss performance optimization (e.g., BLAS/LAPACK; parallel processing)