This course offers an introduction to advanced topics in statistics with the focus of understanding data in the behavioral and social sciences. It is a practical course in which learning statistical concepts and building models in R go hand in hand. The course is organized into three parts: In the first part, we will learn how to visualize, wrangle, and simulate data in R. In the second part, we will cover topics in frequentist statistics (such as multiple regression, logistic regression, and mixed effects models) using the general linear model as an organizing framework. We will learn how to compare models using simulation methods such as bootstrapping and cross-validation. In the third part, we will focus on Bayesian data analysis as an alternative framework for answering statistical questions.

Requirement: Psych 10, Stats 60, or equivalent.

2024 Team

Tobi Gerstenberg Ari Beller Beth Rispoli Satchel Grant Shawn Schwartz
Tobi Gerstenberg Ari Beller Beth Rispoli Satchel Grant Shawn Schwartz
Role Instructor Teaching assistant Teaching assistant Teaching assistant Teaching assisstant
Pronouns he/him they/them she/her he/him he/him
Email (@stanford.edu) gerstenberg abeller brispoli grantsrb stschwartz
Office hours Monday 1-2pm

Where and when?

The meetings will be in person and as shown below.


Lectures: The class meets Monday, Wednesday, and Friday 10:30-11:50am in 200-203 (Lane History Corner).

Sections: Sections are on Monday 3:30-4:20pm in 420-245 and on Friday 12:30-1:20pm in McMurtry Art Building 350 (attendance is optional).

Overview

Day Date Topic
Monday January 8th Introduction
Wednesday January 10th Visualization 1
Friday January 12th Visualization 2
Monday January 15th Martin Luther King Jr. Day
Wednesday January 17th Data wrangling 1
Friday January 19th Data wrangling 2
Monday January 22nd Probability
Wednesday January 24th Simulation 1
Friday January 26th Simulation 2
Monday January 29th Modeling data
Wednesday January 31st Linear model 1
Friday February 2nd Linear model 2
Monday February 5th Linear model 3
Wednesday February 7th Linear model 4
Friday February 9th Power analysis
Monday February 12th Model comparison
Wednesday February 14th No class (due to Midterm)
Friday February 16th Causation
Monday February 19th President’s Day
Wednesday February 21st Linear mixed effects models 1
Friday February 23rd Linear mixed effects models 2
Monday February 26th Linear mixed effects models 3
Wednesday February 28th Linear mixed effects models 4
Friday March 1st Generalized linear model
Monday March 4th Bayesian data analysis 1
Wednesday March 6th Bayesian data analysis 2
Friday March 8th Bayesian data analysis 3
Monday March 11th Summary and course outlook
Wednesday March 13th Guest lecture: Laura Gwilliams
Friday March 15th Guest lecture: Satchel Grant & Shawn Schwartz

Due dates

  • Thursday, January 18th: Homework 1
  • Thursday, January 25th: Homework 2
  • Thursday, February 1st: Homework 3
  • Thursday, February 8th: Homework 4
  • Friday, February 16th: Midterm
  • Thursday, February 22nd: Project proposal
  • Thursday, February 29th: Homework 5
  • Thursday, March 7th: Homework 6
  • Thursday, March 14th: Homework 7 (optional)
  • Monday, March 18th: Final project presentation
  • Friday, March 22nd: Final project report

More infos about each class

Introduction

Content:

  • Course introduction

Resources:

Datacamp:

Visualization 1

Content:

  • Get familiar with the RStudio interface.
  • Take a look at some suboptimal plots, and think about how to make them better.
  • Understand the general philosophy behind ggplot2 – a grammar of graphics.
  • Understand the mapping from data to geoms in ggplot2.
  • Create informative figures using grouping and facets.

Resources:

Datacamp:

Reading:

Visualization 2

Content:

  • Decide what plot is appropriate for what kind of data.
  • Customize plots: Take a sad plot and make it better.
  • Save plots.
  • Make figure panels.
  • Debug.
  • Make animations.
  • Define snippets.

Resources:

Datacamp:

Reading:

Data wrangling 1

Content:

  • Review R basics (incl. variable modes, data types, operators, control flow, and functions).
  • Learn how the pipe operator %>% works.
  • See different ways for getting a sense of one’s data.
  • Master key data manipulation verbs from the dplyr package (incl. filter(), rename(), select(), mutate(), and arrange())

Resources:

Datacamp:

Reading:

Data wrangling 2

Content:

  • Learn how to group and summarize data using group_by() and summarize().
  • Learn how to deal with missing data entries NA.
  • Get familiar with how to reshape data using pivot_longer(), pivot_wider(), separate() and unite().
  • Learn the basics of how to join multiple data frames with a focus on left_join().
  • Master how to read and save data.

Resources:

Datacamp:

Reading:

Probability

Content:

  • Refresh our understanding of probability theory.
    • Conditional probability.
    • Independence.
    • Joint probability.
    • Law of Total Probability.
    • Bayes’ rule.
  • Appreciate different interpretations of probability.
  • Basic understanding of Bayesian networks and common patterns of inference.
  • Causal Bayesian networks: difference between observation and intervention.

Resources:

Datacamp:

Reading:

Simulation 1

Content:

  • Working with probability distributions.
    • dnorm(), pnorm(), qnorm(), rnorm()
  • Computing probabilities.
  • Bayesian inference (analytic and via sampling).
  • Working with samples.
    • density(), quantile()
    • Comparing distributions.

Datacamp:

Reading:

Simulation 2

Content:

  • The rationale behind statistical inference.
  • The central limit theorem.
  • Understanding sampling distributions.
  • Understanding p-values via a permutation test.
  • Correctly interpreting confidence intervals.

Resources:

Datacamp:

Reading:

Modeling data

Content:

  • Hypothesis testing as model comparison.
  • Modeling data: Data = Model + Error
  • Error and parameter estimates.
  • Properties of estimators.
  • Statistical inferences about parameter values.

Datacamp:

Reading:

Linear model 1

Content:

  • Correlation.
    • Pearson’s moment correlation.
    • Spearman’s rank correlation.
  • Regression.
    • Understand conceptually and learn how to do it in R.

Resources:

Reading:

Linear model 2

Content:

  • Multiple regression.
    • Appreciate model assumptions.
  • Several continuous predictors.
    • Hypothesis tests.
    • Interpreting parameters.
    • Reporting results.
  • One categorical predictor.
  • Both continuous and categorical predictors.
  • Interpreting interactions.

Resources:

Datacamp:

Reading:

Linear model 3

Content:

  • Linear model with one multi-level categorical predictor (One-way ANOVA).
  • Linear model with multiple categorical predictors (N-way ANOVA).
    • dummy-coding vs. effect-coding
    • planned contrasts

Resources:

Datacamp:

Reading:

Linear model 4

Content:

  • Interpreting ANOVA results.
  • Simulating data, inferring and interpreting parameters.
  • Planned contrasts.

Datacamp:

Reading:

Power analysis

Content:

  • Making decisions based on statistical inference.
  • The concept of statistical power.
  • Calculating power.
  • Common effect size measures.

Resources:

Datacamp:

Reading:

Model comparison

Content:

  • Model comparison.
  • Underfitting vs. overfitting.
  • Cross-validation.
    • Leave-one-out cross-validation.
    • k-fold cross-validation.
    • Monte Carlo cross-validation.
  • Information criteria: AIC and BIC.

Resources:

Datacamp:

Reading:

Causality

Content:

  • Simulating a mediation analysis.
  • Baron and Kenny’s (1986) steps for mediation.
  • Testing the significance of a mediation.
    • Sobel test.
    • Bootstrapping.
    • Bayesian approach.
  • Limitations of mediation analysis.
  • Simulating a moderator effect.

Resources:

Reading:

Linear mixed effects models 1

Content:

  • Understanding sources of dependence in data.
    • fixed effects vs. random effects.
  • lmer() syntax in R.
  • Understanding the lmer() summary.
  • Simulating data from an lmer().

Resources:

Datacamp:

Reading:

Linear mixed effects models 2

Content:

  • Understanding the Simpson’s paradox.
  • An lmer() worked example.
    • complete pooling vs. no pooling vs. partial pooling.

Resources:

Reading:

Linear mixed effects models 3

Content:

  • Bootstrapping linear mixed effects models.
  • Getting p-values.
  • Pitfalls in fitting lmer()s (and what to do about it).
  • Understanding lmer() syntax even better.

Reading:

Linear mixed effects models 4

  • Some worked examples.
  • Doing follow-up tests with the emmans package.

Generalized linear model

Content:

  • Logistic regression.
  • Logit transform.
  • Fitting a logistic regression in R.
  • Visualizing and interpreting model predictions.
  • Simulating data from a logistic regression.
  • Assessing model fit.
  • Testing hypotheses.
  • Reporting results.
  • Mixed effects logistic regression.

Resources:

Datacamp:

Reading:

Bayesian data analysis 1

Content:

  • Doing Bayesian inference “by hand” Understanding the effect that prior, likelihood, and sample size have on - the posterior.
  • Doing Bayesian data analysis with greta
    • A simple linear regression.

Datacamp:

Reading:

Bayesian data analysis 2

Content:

  • Building Bayesian models with brms.
    • Model evaluation:
    • Visualizing and interpreting results.
    • Testing hypotheses.
    • Inference evaluation: Did things work out?

Reading:

Bayesian data analysis 3

Content:

  • Evidence for null results.
  • Only positive predictors.
  • Dealing with unequal variance.
  • Modeling slider data: Zero-one inflated beta binomial model.
  • Modeling Likert scale data: Ordinal logistic regression.

Resources:

Reading:

What you will learn

You will learn how to use R to …

  • read, wrangle, simulate and analyze data
  • make publication-ready plots

Understand the philosophy behind null hypothesis significance testing (NHST) and Bayesian statistics through …

  • running computer simulations and visualizing the results

Formulate research questions as statistical models and …

  • determine which models work for different situations

Communicate what you have learned about your data …

  • in short presentations in class, showcasing your visualization and analysis
  • in written reports

Contribute to open and reproducible science through …

  • adopting good coding practices
  • sharing your data and research reports online

What to expect?

In “A Vision for Stanford”, university president Marc Tessier-Lavigne states that Stanford wants to be

“an inspired, inclusive and collaborative community of diverse scholars, students and staff, where all are supported and empowered to thrive.”

Let’s try our best together in this class to make this happen!

What you can expect from me

I will …

  • start and end each class on time.
  • be there for your after class in case you have any questions.
  • be there for you during office hours.
  • not be able to provide general stats consultation. The Statistics Department provides consultations.

What I expect from you

You will …

  • attend the classes and participate in class discussion.
  • submit your homework assignments, midterm, and final project on time.

Resources

Readings

For many classes, there will be readings and/or accompanying online interactive tutorials. We won’t adopt a course textbook.

Course notes:

The course notes are available as an online book here.

Free online books:

Text books:

Grading

  • Homework: 40%
  • Midterm: 20%
  • Final project: 40%
    • Proposal: 5%
    • Presentation: 10%
    • Report: 25%
  • Bonus:
    • Ed discussion: 2%

Policies

Please familiarize yourself with Stanford’s honor code. We will adhere to it and follow through on its penalty guidelines.

When is the weekly homework due?

Each week, we will make the homework available on Friday after class. The homework is then due on Thursday 8pm the week after.

What if I turn my homework in late?

You will have 5 slip days in total. If you return a homework within 24h after the deadline, this costs you one slip day (or 2 slip days if you return it within 48h, etc.). If you’ve use up all your slip days, late homework submissions from that point on will receive a score of 0.

Can we work in groups?

Work for the course will include both homework assignments and a final project.

  • Homework assignments: You are encouraged to work in groups. However, your writeup must be your own (both the coding as well as any written text). You will indicate who you worked with on your writeup.
  • Final project: You can either work on your own, or in a group of no more than three members. The project expectations scale with the size of the group (i.e. more is expected from a 3-person group compared to an individual project). A group will jointly write the project proposal, give the class presentation, and prepare the final report. Every member of a group will receive the same grade.

Support

Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).

Stanford is committed to ensuring that all courses are financially accessible to its students. If you require assistance with the cost of course textbooks, supplies, materials and/or fees, you can contact the First Generation and/or Low-Income Student Success Center) to learn about the FLIbrary and other resources they have available for support.

Stanford offers several tutoring and coaching services:

Feedback

We welcome feedback regarding the course at any point. Please feel free to email us directly, or leave anonymous feedback for the teaching team by using this form.