This course offers an introduction to advanced topics in statistics with the focus of understanding data in the behavioral and social sciences. It is a practical course in which learning statistical concepts and building models in R go hand in hand. The course is organized into three parts: In the first part, we will learn how to visualize, wrangle, and simulate data in R. In the second part, we will cover topics in frequentist statistics (such as multiple regression, logistic regression, and mixed effects models) using the general linear model as an organizing framework. We will learn how to compare models using simulation methods such as bootstrapping and cross-validation. In the third part, we will focus on Bayesian data analysis as an alternative framework for answering statistical questions.

Requirement: Psych 10, Stats 60, or equivalent.

Team

Role Instructor Instructor Teaching assistant Teaching assistant Teaching assisstant Teaching assisstant Teaching assisstant
Pronouns he/him he/him she/her she/her he/him he/him she/her
Email (@stanford.edu) gerstenberg nilamram alicexue cgarton justin.yang grantsrb vyqlua
Office hours Wednesday 1:30-2:30pm Monday 1:30-2:30pm

Where and when?

The meetings will be in person and as shown below.


Lectures: The class meets Monday, Wednesday, and Friday 10:30-11:50am in 200-205 (Lane History Corner).

Sections: Sections are on Tuesdays and Thursdays 3:30-4:20pm in Hewlett Teaching Center Rm 101 (attendance is optional).

Overview

Day Date Topic
Monday January 6th Introduction
Wednesday January 8th Visualization 1
Friday January 10th Visualization 2
Monday January 13th Data wrangling 1
Wednesday January 15th Data wrangling 2
Friday January 17th Probability
Monday January 20th Martin Luther King Jr. Day
Wednesday January 22nd Simulation 1
Friday January 24th Simulation 2
Monday January 27th Modeling data
Wednesday January 29th Linear model 1
Friday January 31st Linear model 2
Monday February 3rd Linear model 3
Wednesday February 5th Linear model 4
Friday February 7th Generalized linear model
Monday February 10th Power analysis
Wednesday February 12th No class (due to Midterm)
Friday February 14th Model comparison
Monday February 17th President’s Day
Wednesday February 19th Linear mixed effects models 1
Friday February 21st Linear mixed effects models 2
Monday February 24th Linear mixed effects models 3
Wednesday February 26th Linear mixed effects models 4
Friday February 28th Causation
Monday March 3rd Bayesian data analysis 1
Wednesday March 5th Bayesian data analysis 2
Friday March 7th Bayesian data analysis 3
Monday March 10th Summary and course outlook
Wednesday March 12th Guest lecture
Friday March 14th Guest lecture

Due dates

  • Thursday, January 16th: Homework 1
  • Thursday, January 23th: Homework 2
  • Thursday, January 30th: Homework 3
  • Thursday, February 6th: Homework 4
  • Friday, February 14th: Midterm
  • Thursday, February 20nd: Project proposal
  • Thursday, February 27th: Homework 5
  • Thursday, March 6th: Homework 6
  • Thursday, March 13th: Homework 7 (optional)
  • TBD: Final project presentation
  • Friday, March 21st: Final project report

More infos about each class

Introduction

Content:

  • Course introduction

Resources:

Datacamp:

Visualization 1

Content:

  • Get familiar with the RStudio interface.
  • Take a look at some suboptimal plots, and think about how to make them better.
  • Understand the general philosophy behind ggplot2 – a grammar of graphics.
  • Understand the mapping from data to geoms in ggplot2.
  • Create informative figures using grouping and facets.

Resources:

Datacamp:

Reading:

Visualization 2

Content:

  • Decide what plot is appropriate for what kind of data.
  • Customize plots: Take a sad plot and make it better.
  • Save plots.
  • Make figure panels.
  • Debug.
  • Make animations.
  • Define snippets.

Resources:

Datacamp:

Reading:

Data wrangling 1

Content:

  • Review R basics (incl. variable modes, data types, operators, control flow, and functions).
  • Learn how the pipe operator %>% works.
  • See different ways for getting a sense of one’s data.
  • Master key data manipulation verbs from the dplyr package (incl. filter(), rename(), select(), mutate(), and arrange())

Resources:

Datacamp:

Reading:

Data wrangling 2

Content:

  • Learn how to group and summarize data using group_by() and summarize().
  • Learn how to deal with missing data entries NA.
  • Get familiar with how to reshape data using pivot_longer(), pivot_wider(), separate() and unite().
  • Learn the basics of how to join multiple data frames with a focus on left_join().
  • Master how to read and save data.

Resources:

Datacamp:

Reading:

Probability

Content:

  • Refresh our understanding of probability theory.
    • Conditional probability.
    • Independence.
    • Joint probability.
    • Law of Total Probability.
    • Bayes’ rule.
  • Appreciate different interpretations of probability.
  • Basic understanding of Bayesian networks and common patterns of inference.
  • Causal Bayesian networks: difference between observation and intervention.

Resources:

Datacamp:

Reading:

Simulation 1

Content:

  • Working with probability distributions.
    • dnorm(), pnorm(), qnorm(), rnorm()
  • Computing probabilities.
  • Bayesian inference (analytic and via sampling).
  • Working with samples.
    • density(), quantile()
    • Comparing distributions.

Datacamp:

Reading:

Simulation 2

Content:

  • The rationale behind statistical inference.
  • The central limit theorem.
  • Understanding sampling distributions.
  • Understanding p-values via a permutation test.
  • Correctly interpreting confidence intervals.

Resources:

Datacamp:

Reading:

Modeling data

Content:

  • Hypothesis testing as model comparison.
  • Modeling data: Data = Model + Error
  • Error and parameter estimates.
  • Properties of estimators.
  • Statistical inferences about parameter values.

Datacamp:

Reading:

Linear model 1

Content:

  • Correlation.
    • Pearson’s moment correlation.
    • Spearman’s rank correlation.
  • Regression.
    • Understand conceptually and learn how to do it in R.

Resources:

Reading:

Linear model 2

Content:

  • Multiple regression.
    • Appreciate model assumptions.
  • Several continuous predictors.
    • Hypothesis tests.
    • Interpreting parameters.
    • Reporting results.
  • One categorical predictor.
  • Both continuous and categorical predictors.
  • Interpreting interactions.

Resources:

Datacamp:

Reading:

Linear model 3

Content:

  • Linear model with one multi-level categorical predictor (One-way ANOVA).
  • Linear model with multiple categorical predictors (N-way ANOVA).
    • dummy-coding vs. effect-coding
    • planned contrasts

Resources:

Datacamp:

Reading:

Linear model 4

Content:

  • Interpreting ANOVA results.
  • Simulating data, inferring and interpreting parameters.
  • Planned contrasts.

Datacamp:

Reading:

Generalized linear model

Content:

  • Logistic regression.
  • Logit transform.
  • Fitting a logistic regression in R.
  • Visualizing and interpreting model predictions.
  • Simulating data from a logistic regression.
  • Assessing model fit.
  • Testing hypotheses.
  • Reporting results.

Resources:

Datacamp:

Reading:

Power analysis

Content:

  • Making decisions based on statistical inference.
  • The concept of statistical power.
  • Calculating power.
  • Common effect size measures.

Resources:

Datacamp:

Reading:

Model comparison

Content:

  • Model comparison.
  • Underfitting vs. overfitting.
  • Cross-validation.
    • Leave-one-out cross-validation.
    • k-fold cross-validation.
    • Monte Carlo cross-validation.
  • Information criteria: AIC and BIC.

Resources:

Datacamp:

Reading:

Linear mixed effects models 1

Content:

  • Understanding sources of dependence in data.
    • fixed effects vs. random effects.
  • lmer() syntax in R.
  • Understanding the lmer() summary.
  • Simulating data from an lmer().

Resources:

Datacamp:

Reading:

Linear mixed effects models 2

Content:

  • Understanding the Simpson’s paradox.
  • An lmer() worked example.
    • complete pooling vs. no pooling vs. partial pooling.

Resources:

Reading:

Linear mixed effects models 3

Content:

  • Bootstrapping linear mixed effects models.
  • Getting p-values.
  • Pitfalls in fitting lmer()s (and what to do about it).
  • Understanding lmer() syntax even better.

Reading:

Linear mixed effects models 4

  • Some worked examples.
  • Doing follow-up tests with the emmans package.

Causality

Content:

  • Simulating a mediation analysis.
  • Baron and Kenny’s (1986) steps for mediation.
  • Testing the significance of a mediation.
    • Sobel test.
    • Bootstrapping.
    • Bayesian approach.
  • Limitations of mediation analysis.
  • Simulating a moderator effect.

Resources:

Reading:

Bayesian data analysis 1

Content:

  • Doing Bayesian inference “by hand” Understanding the effect that prior, likelihood, and sample size have on - the posterior.
  • Doing Bayesian data analysis with greta
    • A simple linear regression.

Datacamp:

Reading:

Bayesian data analysis 2

Content:

  • Building Bayesian models with brms.
    • Model evaluation:
    • Visualizing and interpreting results.
    • Testing hypotheses.
    • Inference evaluation: Did things work out?

Reading:

Bayesian data analysis 3

Content:

  • Evidence for null results.
  • Only positive predictors.
  • Dealing with unequal variance.
  • Modeling slider data: Zero-one inflated beta binomial model.
  • Modeling Likert scale data: Ordinal logistic regression.

Resources:

Reading:

What you will learn

You will learn how to use R to …

  • read, wrangle, simulate and analyze data
  • make publication-ready plots

Understand the philosophy behind null hypothesis significance testing (NHST) and Bayesian statistics through …

  • running computer simulations and visualizing the results

Formulate research questions as statistical models and …

  • determine which models work for different situations

Communicate what you have learned about your data …

  • in short presentations in class, showcasing your visualization and analysis
  • in written reports

Contribute to open and reproducible science through …

  • adopting good coding practices
  • sharing your data and research reports online

What to expect?

What you can expect from us

We will …

  • start and end each class on time.
  • be there for your after class in case you have any questions.
  • be there for you during office hours.
  • not be able to provide general stats consultation. The Statistics Department provides consultations.

What we expect from you

You will …

  • attend the classes and participate in class discussion.
  • submit your homework assignments, midterm, and final project on time.

Resources

Readings

For many classes, there will be readings and/or accompanying online interactive tutorials. We won’t adopt a course textbook.

Course notes:

The course notes are available as an online book here.

Free online books:

Text books:

Grading

  • Homework: 40%
  • Midterm: 20%
  • Final project: 40%
    • Proposal: 5%
    • Presentation: 10%
    • Report: 25%
  • Bonus:
    • Ed discussion: 2%

Policies

Please familiarize yourself with Stanford’s honor code. We will adhere to it and follow through on its penalty guidelines.

When is the weekly homework due?

Each week, we will make the homework available on Friday after class. The homework is then due on Thursday 8pm the week after.

What if I turn my homework in late?

You will have 5 slip days in total. If you return a homework within 24h after the deadline, this costs you one slip day (or 2 slip days if you return it within 48h, etc.). If you’ve use up all your slip days, late homework submissions from that point on will receive a score of 0.

Can we work in groups?

Work for the course will include both homework assignments and a final project.

  • Homework assignments: You are encouraged to work in groups. However, your writeup must be your own (both the coding as well as any written text). You will indicate who you worked with on your writeup.
  • Final project: You can either work on your own, or in a group of no more than three members. The project expectations scale with the size of the group (i.e. more is expected from a 3-person group compared to an individual project). A group will jointly write the project proposal, give the class presentation, and prepare the final report. Every member of a group will receive the same grade.

Support

Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).

Stanford is committed to ensuring that all courses are financially accessible to its students. If you require assistance with the cost of course textbooks, supplies, materials and/or fees, you can contact the First Generation and/or Low-Income Student Success Center) to learn about the FLIbrary and other resources they have available for support.

Stanford offers several tutoring and coaching services:

Feedback

We welcome feedback regarding the course at any point. Please feel free to email us directly, or leave anonymous feedback for the teaching team by using this form.