PSYCH 252: Statistical Methods

This course offers an introduction to advanced topics in statistics with the focus of understanding data in the behavioral and social sciences. It is a practical course in which learning statistical concepts and building models in R go hand in hand. The course is organized into three parts: In the first part, we will learn how to visualize, wrangle, and simulate data in R. In the second part, we will cover topics in frequentist statistics (such as multiple regression, logistic regression, and mixed effects models) using the general linear model as an organizing framework. We will learn how to compare models using simulation methods such as bootstrapping and cross-validation. In the third part, we will focus on Bayesian data analysis as an alternative framework for answering statistical questions.

Requirement: Psych 10, Stats 60, or equivalent.

Team

	Tobi Gerstenberg	Nilam Ram	Alice Xue	Catherine Garton	Justin Yang	Satchel Grant	Verity Lua

Role	Instructor	Instructor	Teaching assistant	Teaching assistant	Teaching assisstant	Teaching assisstant	Teaching assisstant
Pronouns	he/him	he/him	she/her	she/her	he/him	he/him	she/her
Email (@stanford.edu)	gerstenberg	nilamram	alicexue	cgarton	justin.yang	grantsrb	vyqlua
Office hours	Wednesday 1:30-2:30pm	Monday 12:30-1:45pm

Where and when?

The meetings will be in person and as shown below.

Lectures: The class meets Monday, Wednesday, and Friday 10:30-11:50am in 200-205 (Lane History Corner).

Sections: Sections are on Tuesdays and Thursdays 3:30-4:20pm in Hewlett Teaching Center Rm 101 (attendance is optional).

Overview

Day	Date	Topic
Monday	January 6th	Introduction
Wednesday	January 8th	Visualization 1
Friday	January 10th	Visualization 2
Monday	January 13th	Data wrangling 1
Wednesday	January 15th	Data wrangling 2
Friday	January 17th	Probability
Monday	January 20th	Martin Luther King Jr. Day
Wednesday	January 22nd	Simulation 1
Friday	January 24th	Simulation 2
Monday	January 27th	Modeling data
Wednesday	January 29th	Linear model 1
Friday	January 31st	Linear model 2
Monday	February 3rd	Linear model 3
Wednesday	February 5th	Linear model 4
Friday	February 7th	Generalized linear model
Monday	February 10th	Power analysis
Wednesday	February 12th	No class (due to Midterm)
Friday	February 14th	Model comparison
Monday	February 17th	President’s Day
Wednesday	February 19th	Linear mixed effects models 1
Friday	February 21st	Linear mixed effects models 2
Monday	February 24th	Linear mixed effects models 3
Wednesday	February 26th	Linear mixed effects models 4
Friday	February 28th	Causation
Monday	March 3rd	Bayesian data analysis 1
Wednesday	March 5th	Bayesian data analysis 2
Friday	March 7th	Bayesian data analysis 3
Monday	March 10th	Bayesian data analysis 4
Wednesday	March 12th	Summary and course outlook
Friday	March 14th	TA presentations

Due dates

Thursday, January 16th: Homework 1
Thursday, January 23th: Homework 2
Thursday, January 30th: Homework 3
Thursday, February 6th: Homework 4
Friday, February 14th: Midterm
Thursday, February 20nd: Project proposal
Thursday, February 27th: Homework 5
Thursday, March 6th: Homework 6
Thursday, March 13th: Homework 7 (optional)
Monday, March 17th: Final project presentation (3:30pm to 6:30pm)
Friday, March 21st: Final project report

More infos about each class

Introduction

Content:

Course introduction

Resources:

Datacamp:

Visualization 1

Content:

Get familiar with the RStudio interface.
Take a look at some suboptimal plots, and think about how to make them better.
Understand the general philosophy behind ggplot2 – a grammar of graphics.
Understand the mapping from data to geoms in ggplot2.
Create informative figures using grouping and facets.

Resources:

Cheatsheet ggplot2

Datacamp:

Reading:

Visualization 2

Content:

Decide what plot is appropriate for what kind of data.
Customize plots: Take a sad plot and make it better.
Save plots.
Make figure panels.
Debug.
Make animations.
Define snippets.

Resources:

Cheatsheet shiny

Datacamp:

Reading:

Data wrangling 1

Content:

Review R basics (incl. variable modes, data types, operators, control flow, and functions).
Learn how the pipe operator %>% works.
See different ways for getting a sense of one’s data.
Master key data manipulation verbs from the dplyr package (incl. filter(), rename(), select(), mutate(), and arrange())

Resources:

Datacamp:

Reading:

Data wrangling 2

Content:

Learn how to group and summarize data using group_by() and summarize().
Learn how to deal with missing data entries NA.
Get familiar with how to reshape data using pivot_longer(), pivot_wider(), separate() and unite().
Learn the basics of how to join multiple data frames with a focus on left_join().
Master how to read and save data.

Resources:

Datacamp:

Reading:

Probability

Content:

Refresh our understanding of probability theory.
- Conditional probability.
- Independence.
- Joint probability.
- Law of Total Probability.
- Bayes’ rule.
Appreciate different interpretations of probability.
Basic understanding of Bayesian networks and common patterns of inference.
Causal Bayesian networks: difference between observation and intervention.

Resources:

Probability cheatsheet

Datacamp:

probability puzzles in R

Reading:

Course notes: Probability and causality

Simulation 1

Content:

Working with probability distributions.
- dnorm(), pnorm(), qnorm(), rnorm()
Computing probabilities.
Bayesian inference (analytic and via sampling).
Working with samples.
- density(), quantile()
- Comparing distributions.

Datacamp:

Foundations of Probability in R

Reading:

Course notes: Simulation 1

Simulation 2

Content:

The rationale behind statistical inference.
The central limit theorem.
Understanding sampling distributions.
Understanding p-values via a permutation test.
Correctly interpreting confidence intervals.

Resources:

Central limit theorem visualization

Datacamp:

Foundations of Inference

Reading:

Course notes: Simulation 2

Modeling data

Content:

Hypothesis testing as model comparison.
Modeling data: Data = Model + Error
Error and parameter estimates.
Properties of estimators.
Statistical inferences about parameter values.

Datacamp:

Foundations of Inference

Reading:

Course notes: Modeling data
Data analysis: A model comparison approach to regression ANOVA and beyond (#1-4)

Linear model 1

Content:

Correlation.
- Pearson’s moment correlation.
- Spearman’s rank correlation.
Regression.
- Understand conceptually and learn how to do it in R.

Resources:

Reading:

Course notes: Linear model 1

Linear model 2

Content:

Multiple regression.
- Appreciate model assumptions.
Several continuous predictors.
- Hypothesis tests.
- Interpreting parameters.
- Reporting results.
One categorical predictor.
Both continuous and categorical predictors.
Interpreting interactions.

Resources:

Nice review of multiple regression in R

Datacamp:

inference for linear regression

Reading:

Linear model 3

Content:

Linear model with one multi-level categorical predictor (One-way ANOVA).
Linear model with multiple categorical predictors (N-way ANOVA).
- dummy-coding vs. effect-coding
- planned contrasts

Resources:

Explanation of different types of sums of squares

Datacamp:

modeling

Reading:

Course notes: Linear model 3

Linear model 4

Content:

Interpreting ANOVA results.
Simulating data, inferring and interpreting parameters.
Planned contrasts.

Datacamp:

inference in regression

Reading:

Course notes: Linear model 4

Generalized linear model

Content:

Logistic regression.
Logit transform.
Fitting a logistic regression in R.
Visualizing and interpreting model predictions.
Simulating data from a logistic regression.
Assessing model fit.
Testing hypotheses.
Reporting results.

Resources:

Binary prediction metrics

Datacamp:

Reading:

Course notes: Generalized linear model

Power analysis

Content:

Making decisions based on statistical inference.
The concept of statistical power.
Calculating power.
Common effect size measures.

Resources:

Datacamp:

functional programming

Reading:

Course notes: Power analysis

Model comparison

Content:

Model comparison.
Underfitting vs. overfitting.
Cross-validation.
- Leave-one-out cross-validation.
- k-fold cross-validation.
- Monte Carlo cross-validation.
Information criteria: AIC and BIC.

Resources:

caret: General framework for modeling data in R.
cross-validation for model selection: cross-validation in multi-level designs

Datacamp:

Reading:

Linear mixed effects models 1

Content:

Understanding sources of dependence in data.
- fixed effects vs. random effects.
lmer() syntax in R.
Understanding the lmer() summary.
Simulating data from an lmer().

Resources:

Simulating Data for Mixed effects models

Datacamp:

mixed effects model

Reading:

Linear mixed effects models 2

Content:

Understanding the Simpson’s paradox.
An lmer() worked example.
- complete pooling vs. no pooling vs. partial pooling.

Resources:

Reading:

Linear mixed effects models 3

Content:

Bootstrapping linear mixed effects models.
Getting p-values.
Pitfalls in fitting lmer()s (and what to do about it).
Understanding lmer() syntax even better.

Reading:

Course notes: Linear mixed effects models 3

Linear mixed effects models 4

Some worked examples.
Doing follow-up tests with the emmans package.

Causality

Content:

Simulating a mediation analysis.
Baron and Kenny’s (1986) steps for mediation.
Testing the significance of a mediation.
- Sobel test.
- Bootstrapping.
- Bayesian approach.
Limitations of mediation analysis.
Simulating a moderator effect.

Resources:

Reading:

Bayesian data analysis 1

Content:

Doing Bayesian inference “by hand” Understanding the effect that prior, likelihood, and sample size have on - the posterior.
Doing Bayesian data analysis with greta
- A simple linear regression.

Datacamp:

Bayesian inference

Reading:

Course notes: Bayesian data analysis 1

Bayesian data analysis 2

Content:

Building Bayesian models with brms.
- Model evaluation:
- Visualizing and interpreting results.
- Testing hypotheses.
- Inference evaluation: Did things work out?

Reading:

Course notes: Bayesian data analysis 2

Bayesian data analysis 3

Content:

Evidence for null results.
Only positive predictors.
Dealing with unequal variance.
Modeling slider data: Zero-one inflated beta binomial model.
Modeling Likert scale data: Ordinal logistic regression.

Resources:

Reading:

What you will learn

You will learn how to use R to …

read, wrangle, simulate and analyze data
make publication-ready plots

Understand the philosophy behind null hypothesis significance testing (NHST) and Bayesian statistics through …

running computer simulations and visualizing the results

Formulate research questions as statistical models and …

determine which models work for different situations

Communicate what you have learned about your data …

in short presentations in class, showcasing your visualization and analysis
in written reports

Contribute to open and reproducible science through …

adopting good coding practices
sharing your data and research reports online

What to expect?

What you can expect from us

We will …

start and end each class on time.
be there for your after class in case you have any questions.
be there for you during office hours.
not be able to provide general stats consultation. The Statistics Department provides consultations.

What we expect from you

You will …

attend the classes and participate in class discussion.
submit your homework assignments, midterm, and final project on time.

Resources

Readings

For many classes, there will be readings and/or accompanying online interactive tutorials. We won’t adopt a course textbook.

Course notes:

The course notes are available as an online book here.

Free online books:

R for Data Science: Introduction to data manipulation and visualization in R.
Data visualization: A practical introduction: Data visualization in R.
The Effect: An Introduction to Research Design and Causality: Introduction to causal inference.
An Introduction to Data Analysis: Introduction to data analysis in R including plotting, data wrangling, and Bayesian data analysis.
An introduction to statistical learning: Introduction to statistical learning including regression, classification, resampling, and unsupervised learning techniques.
An Introduction to Statistical and Data Sciences via R: Free online book that introduces statistical concepts using many of the same tools we are using in class (ggplot2, dplyr, sampling methods, …).
OpenIntro Statistics: Introduction to statistics.
Statistical thinking for the 21st century: Online book written by Russ Poldrack for the undergraduate statistics course in psychology here at Stanford (Psych 10).
An introduction to Bayesian thinking: Companion book to the Statistics with R course on coursera.
Answering questions with data: Introductory statistics textbook.
Advanced R: The book is designed primarily for R users who want to improve their programming skills and understanding of the language.
Statistical Inference via Data Science: A moderndive into R and the tidyverse: A gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.
Just enough R: Basic introduction into some of the core R concepts.
Tidy Modeling with R: Introduction to modeling in R using the tidyverse.

Text books:

Data analysis: A model comparison approach to regression, ANOVA, and beyond: Data analysis for the social sciences with a focus on model comparison under the rubric of the general linear model.
Statistical rethinking: Introduction to Bayesian data analysis in R (see this free online book for an implementation using tidyverse and the brms packages)
A Student’s Guide to Bayesian Statistics: Quote from the book: “This book is for anyone who has tried and failed at statistics, particularly Bayesian statistics.”
Data analysis using regression and multilevel/hierarchical models: Extensive discussion of regression and multilevel models.
Discovering statistics using R: Introduction to frequentist statistics.

Grading

Homework: 40%
Midterm: 20%
Final project: 40%
- Proposal: 5%
- Presentation: 10%
- Report: 25%
Bonus:
- Ed discussion: 2%

Policies

Please familiarize yourself with Stanford’s honor code. We will adhere to it and follow through on its penalty guidelines.

When is the weekly homework due?

Each week, we will make the homework available on Friday after class. The homework is then due on Thursday 8pm the week after.

What if I turn my homework in late?

You will have 5 slip days in total. If you return a homework within 24h after the deadline, this costs you one slip day (or 2 slip days if you return it within 48h, etc.). If you’ve use up all your slip days, late homework submissions from that point on will receive a score of 0.

Can we work in groups?

Work for the course will include both homework assignments and a final project.

Homework assignments: You are encouraged to work in groups. However, your writeup must be your own (both the coding as well as any written text). You will indicate who you worked with on your writeup.
Final project: You can either work on your own, or in a group of no more than three members. The project expectations scale with the size of the group (i.e. more is expected from a 3-person group compared to an individual project). A group will jointly write the project proposal, give the class presentation, and prepare the final report. Every member of a group will receive the same grade.

Support

Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).

Stanford is committed to ensuring that all courses are financially accessible to its students. If you require assistance with the cost of course textbooks, supplies, materials and/or fees, you can contact the First Generation and/or Low-Income Student Success Center) to learn about the FLIbrary and other resources they have available for support.

Stanford offers several tutoring and coaching services:

Feedback

We welcome feedback regarding the course at any point. Please feel free to email us directly, or leave anonymous feedback for the teaching team by using this form.