Meeting time: Tuesdays from 1:30 to 3 pm (beginning June 11) | Instructor: Kelsey Moty
This workshop will help you to learn the fundamentals of R needed to manipulate, visualize, and describe your data. This workshop has a particular
emphasis on producing clean and reproducible code in line with coding and open science best practices. Due to time limitations, we will not be able
to go over how to do statistical modeling in R; however, I will provide a series of resources at the end of this list that you can look over on your
own.
Some of these resources may at times be redundant with one another. Feel free to skip over material that you feel comfortable with. Most importantly,
make sure to work through the exercises! The best way to learn how to code is by actually coding :)
Each week, we will meet for an hour and half to go over the topic for that week. This meeting is meant to be collaborative. You will work together
with other people in our lab to get started with each week's R skill. However, you will also need to complete some of the lesson on your own time,
as an hour and half is likely not enough time to practice that week's topic. If questions come up outside of our Tuesday meeting, feel free to post
them on the R Workshop Slack channel!
You will get to apply the skills learned in this workshop to a dataset from a research project you are currently working on in the lab. At the
end of the workshop, you will share with other members of the lab the dataset you cleaned up, a plot you created from that dataset, and some kind of
analysis you did on that dataset (whether descriptive or inferential).
Before we begin, this workshop pulls from resources written by a lot of amazing people and they deserve credit for it!
Downloading R: Download the appropriate version for your operating system (Mac or Windows) Downloading RStudio: RStudio to makes it much easier to code in R
Improve your programming skills and gain a deep understanding of the R language:
Book:Advanced R: Book on advanced topics in R, including an more in-depth discussion on the foundations of
R plus chapters on functional programming, metaprogramming, and performant code. If you can understand the concepts in this book, you will have a strong foundation for
learning any programming language.
Manipulating strings and pattern matching in R using regular expressions:
data.table: An alternative approach for wrangling data
You may be asking: if I can already wrangle my data using tidyverse, why learn another approach? While I personally prefer to use tidyverse or baseR functions for
manipulating data (I find these approaches to be more readable and user-friendly), data.table is a more efficient codebase. That is, code using data.table runs more
quickly, which is particularly useful when working with large datasets. Many psychology researchers wouldn't notice any gains in speed by implementing their code using
data.table (our datasets just aren't the big). But! If you work with big datasets (like hundreds of thousands to millions of lines of data), data.table can help speed
up data processing.
Reading:Introduction to data.table: Vignette on data.table's syntax and to perform actions comparable to those in tidyverse's dplyr and tidyr packages
When pre-registering your study, one best practice is to also pre-register all the R code you will use for your analyses. How do you write code without data?
One way: simulate a dataset and use that data as you work through your analyses.
Reading:Getting started simulating data in R: Blog with a good
introduction into some of the functions you will use when simulating a dataset. Reading + exercises:Lab to practice simulating data using R Slides + exercises:Slides on simulating data using R: These slides include a series of exercises
to go through as you go along Book:Introduction to Scientific Programming and Simulation Using R: This book assumes no prior experience in programming or probability.
Section 3 on Probability and Section 4 on Simulation are the most relevant for those who already have experience programming in R (i.e., you are familiar with earlier programming topics discussed
on this syllabus).
psyTeachR: Great resource that provides a number of interactive books
and tutorials for doing reproducible research in R. This website covers a broad range of topics on data cleaning, visualization,
reproducible workflows, and more.
From their website: "Our curriculum now emphasizes essential ‘data science’ graduate
skills that have been overlooked in traditional approaches to teaching, including programming skills, data visualisation, data
wrangling and reproducible reports. Students learn about probability and inference through data simulation as well as by working
with real datasets.""