Learning R: 2019 Summer Workshop for NYU CDSC Lab

Meeting time: Tuesdays from 1:30 to 3 pm (beginning June 11) | Instructor: Kelsey Moty

This workshop will help you to learn the fundamentals of R needed to manipulate, visualize, and describe your data. This workshop has a particular emphasis on producing clean and reproducible code in line with coding and open science best practices. Due to time limitations, we will not be able to go over how to do statistical modeling in R; however, I will provide a series of resources at the end of this list that you can look over on your own.

Some of these resources may at times be redundant with one another. Feel free to skip over material that you feel comfortable with. Most importantly, make sure to work through the exercises! The best way to learn how to code is by actually coding :)

Each week, we will meet for an hour and half to go over the topic for that week. This meeting is meant to be collaborative. You will work together with other people in our lab to get started with each week's R skill. However, you will also need to complete some of the lesson on your own time, as an hour and half is likely not enough time to practice that week's topic. If questions come up outside of our Tuesday meeting, feel free to post them on the R Workshop Slack channel!

You will get to apply the skills learned in this workshop to a dataset from a research project you are currently working on in the lab. At the end of the workshop, you will share with other members of the lab the dataset you cleaned up, a plot you created from that dataset, and some kind of analysis you did on that dataset (whether descriptive or inferential).


Before we begin, this workshop pulls from resources written by a lot of amazing people and they deserve credit for it!

A number of the book chapters and other resources we are reading were written by Hadley Wickham, Danielle Navarro, Jenny Bryan, Jim Hester, Kieran Healy, and Andy Fields. Several of the tutorials we are working through are from a course that was taught by Dale Barr and Lisa DeBruine.


Getting your data ready for statistical analysis


Moving forward: Other things you should learn about R:
    Improve your programming skills and gain a deep understanding of the R language:
      Book: Advanced R: Book on advanced topics in R, including an more in-depth discussion on the foundations of R plus chapters on functional programming, metaprogramming, and performant code. If you can understand the concepts in this book, you will have a strong foundation for learning any programming language.

    Manipulating strings and pattern matching in R using regular expressions: data.table: An alternative approach for wrangling data
    You may be asking: if I can already wrangle my data using tidyverse, why learn another approach? While I personally prefer to use tidyverse or baseR functions for manipulating data (I find these approaches to be more readable and user-friendly), data.table is a more efficient codebase. That is, code using data.table runs more quickly, which is particularly useful when working with large datasets. Many psychology researchers wouldn't notice any gains in speed by implementing their code using data.table (our datasets just aren't the big). But! If you work with big datasets (like hundreds of thousands to millions of lines of data), data.table can help speed up data processing.
      Reading: Introduction to data.table: Vignette on data.table's syntax and to perform actions comparable to those in tidyverse's dplyr and tidyr packages
    When pre-registering your study, one best practice is to also pre-register all the R code you will use for your analyses. How do you write code without data? One way: simulate a dataset and use that data as you work through your analyses.

General Resources


Cheat sheets

Various cheat sheets on a range of topics, from dplyr, ggplot, R Markdown, and more!

Books / Tutorials

psyTeachR: Great resource that provides a number of interactive books and tutorials for doing reproducible research in R. This website covers a broad range of topics on data cleaning, visualization, reproducible workflows, and more. From their website: "Our curriculum now emphasizes essential ‘data science’ graduate skills that have been overlooked in traditional approaches to teaching, including programming skills, data visualisation, data wrangling and reproducible reports. Students learn about probability and inference through data simulation as well as by working with real datasets.""