This course offers an introduction to advanced topics in statistics with the focus of understanding data in the behavioral and social sciences. It is a practical course in which learning statistical concepts and building models in R go hand in hand. The course is organized into three parts: In the first part, we will learn how to visualize, wrangle, and simulate data in R. In the second part, we will cover topics in frequentist statistics (such as multiple regression, logistic regression, and mixed effects models) using the general linear model as an organizing framework. We will learn how to compare models using simulation methods such as bootstrapping and cross-validation. In the third part, we will focus on Bayesian data analysis as an alternative framework for answering statistical questions.

Requirement: Psych 10, Stats 60, or equivalent.


Team

Tobi Gerstenberg Ari Beller Sarah Wu Chengxu Zhuang
Tobi Gerstenberg Ari Beller Sarah Wu Chengxu Zhuang
Role Instructor Teaching assistant Teaching assistant Teaching assistant
Pronouns he/him he/they she/her he/him
Email (@stanford.edu) gerstenberg abeller sarahawu chengxuz
Office hours Monday
1:00-2:00pm
Friday
11:30am-12:30pm
Tuesday
1:00-2:00pm
Thursday
10:00-11:00am

Where and when?

In the first two weeks of class, we’ll be meeting via Zoom. Afterwards, the meetings will be in person and as shown below.


Lectures: The class meets Monday, Wednesday, and Friday 9:45-11:15am in 200-205 (Lane History Corner).

Sections: Sections are on Tuesday 5:30-6:30pm in 320-220 and on Thursday 2:45-3:45pm in 260-011 (attendance is optional).

Overview

Day Date Topic
Monday January 3rd Introduction
Wednesday January 5th Visualization I
Friday January 7th Visualization II
Monday January 10th Data wrangling I
Wednesday January 12th Data wrangling II
Friday January 14th Probability
Monday January 17th No class (Martin Luther King Jr. Day)
Wednesday January 19th Simulation I
Friday January 21st Simulation II
Monday January 24th Modeling data
Wednesday January 26th Linear model I
Friday January 28th Linear model II
Monday January 31st Linear model III
Wednesday February 2nd Linear model IV
Friday February 4th Power analysis
Monday February 7th Model comparison
Wednesday February 9th No class (Midterm)
Friday February 11th Causality
Midterm due
Monday February 14th Linear mixed effects models I
Wednesday February 16th Linear mixed effects models II
Thursday February 17th Project proposal due
Friday February 18th Linear mixed effects models III
Monday February 21st No class (Presidents’ Day)
Wednesday February 23rd Linear mixed effects models IV
Friday February 25th Generalized linear model
Monday February 28th Bayesian data analysis I
Wednesday March 2nd Bayesian data analysis II
Friday March 4th Bayesian data analysis III
Monday March 7th Bayesian data analysis IV
Wednesday March 9th Guest lecture
Friday March 11th Guest lecture
Wednesday March 16th Final project presentations
Friday March 18th Final project report due

More infos about each class

Introduction

Content:

  • Course introduction

Resources:

Datacamp:

Visualization I

Content:

  • Get familiar with the RStudio interface.
  • Take a look at some suboptimal plots, and think about how to make them better.
  • Understand the general philosophy behind ggplot2 – a grammar of graphics.
  • Understand the mapping from data to geoms in ggplot2.
  • Create informative figures using grouping and facets.

Resources:

Datacamp:

Reading:

Visualization II

Content:

  • Decide what plot is appropriate for what kind of data.
  • Customize plots: Take a sad plot and make it better.
  • Save plots.
  • Make figure panels.
  • Debug.
  • Make animations.
  • Define snippets.

Resources:

Datacamp:

Reading:

Data wrangling I

Content:

  • Review R basics (incl. variable modes, data types, operators, control flow, and functions).
  • Learn how the pipe operator %>% works.
  • See different ways for getting a sense of one’s data.
  • Master key data manipulation verbs from the dplyr package (incl. filter(), rename(), select(), mutate(), and arrange())

Resources:

Datacamp:

Reading:

Data wrangling II

Content:

  • Learn how to group and summarize data using group_by() and summarize().
  • Learn how to deal with missing data entries NA.
  • Get familiar with how to reshape data using pivot_longer(), pivot_wider(), separate() and unite().
  • Learn the basics of how to join multiple data frames with a focus on left_join().
  • Master how to read and save data.

Resources:

Datacamp:

Reading:

Probability

Content:

  • Refresh our understanding of probability theory.
    • Conditional probability.
    • Independence.
    • Joint probability.
    • Law of Total Probability.
    • Bayes’ rule.
  • Appreciate different interpretations of probability.
  • Basic understanding of Bayesian networks and common patterns of inference.
  • Causal Bayesian networks: difference between observation and intervention.

Resources:

Datacamp:

Reading:

Simulation I

Content:

  • Working with probability distributions.
    • dnorm(), pnorm(), qnorm(), rnorm()
  • Computing probabilities.
  • Bayesian inference (analytic and via sampling).
  • Working with samples.
    • density(), quantile()
    • Comparing distributions.

Datacamp:

Reading:

Simulation II

Content:

  • The rationale behind statistical inference.
  • The central limit theorem.
  • Understanding sampling distributions.
  • Understanding p-values via a permutation test.
  • Correctly interpreting confidence intervals.

Resources:

Datacamp:

Reading:

Modeling data

Content:

  • Hypothesis testing as model comparison.
  • Modeling data: Data = Model + Error
  • Error and parameter estimates.
  • Properties of estimators.
  • Statistical inferences about parameter values.

Datacamp:

Reading:

Linear model I

Content:

  • Correlation.
    • Pearson’s moment correlation.
    • Spearman’s rank correlation.
  • Regression.
    • Understand conceptually and learn how to do it in R.

Resources:

Datacamp:

Reading:

Linear model II

Content:

  • Multiple regression.
    • Appreciate model assumptions.
  • Several continuous predictors.
    • Hypothesis tests.
    • Interpreting parameters.
    • Reporting results.
  • One categorical predictor.
  • Both continuous and categorical predictors.
  • Interpreting interactions.

Resources:

Datacamp:

Reading:

Linear model III

Content:

  • Linear model with one multi-level categorical predictor (One-way ANOVA).
  • Linear model with multiple categorical predictors (N-way ANOVA).
    • dummy-coding vs. effect-coding
    • planned contrasts

Resources:

Datacamp:

Reading:

Linear model IV

Content:

  • Interpreting ANOVA results.
  • Simulating data and inferring model parameters.
  • Planned contrasts.
    • Defining contrast codes.

Datacamp:

Reading:

Power analysis

Content:

  • Making decisions based on statistical inference.
  • The concept of statistical power.
  • Calculating power.
  • Common effect size measures.

Resources:

Datacamp:

Reading:

Model comparison

Content:

  • Model comparison.
  • Underfitting vs. overfitting.
  • Cross-validation.
    • Leave-one-out cross-validation.
    • k-fold cross-validation.
    • Monte Carlo cross-validation.
  • Information criteria: AIC and BIC.

Resources:

Datacamp:

Reading:

Causality

Content:

  • Simulating a mediation analysis.
  • Baron and Kenny’s (1986) steps for mediation.
  • Testing the significance of a mediation.
    • Sobel test.
    • Bootstrapping.
    • Bayesian approach.
  • Limitations of mediation analysis.
  • Simulating a moderator effect.

Resources:

Reading:

Linear mixed effects models I

Content:

  • Understanding sources of dependence in data.
    • fixed effects vs. random effects.
  • lmer() syntax in R.
  • Understanding the lmer() summary.
  • Simulating data from an lmer().

Resources:

Datacamp:

Reading:

Linear mixed effects models II

Content:

  • Understanding the Simpson’s paradox.
  • An lmer() worked example.
    • complete pooling vs. no pooling vs. partial pooling.

Resources:

Reading:

Linear mixed effects models III

Content:

  • Bootstrapping linear mixed effects models.
  • Getting p-values.
  • Pitfalls in fitting lmer()s (and what to do about it).
  • Understanding lmer() syntax even better.

Reading:

Linear mixed effects models IV

  • Content to be determined

Generalized linear model

Content:

  • Logistic regression.
  • Logit transform.
  • Fitting a logistic regression in R.
  • Visualizing and interpreting model predictions.
  • Simulating data from a logistic regression.
  • Assessing model fit.
  • Testing hypotheses.
  • Reporting results.
  • Mixed effects logistic regression.

Resources:

Datacamp:

Reading:

Bayesian data analysis I

Content:

  • Comparison between frequentist and Bayesian data analysis.
  • Objections to frequentist null hypothesis testing.
  • Benefits of Bayesian data analysis.
  • Bayesian models of cognition.
  • A simple worked coin flip example.
  • Posterior, prior, likelihood, sample size.

Datacamp:

Reading:

Bayesian data analysis II

Content:

  • Simple Bayesian inference example.
  • Bayes’ rule in action.
    • Common likelihood functions.
    • Common prior functions.
    • How to do inference.
  • Doing Bayesian data analysis.
    • A simple linear regression.
    • Posterior predictive checks.
    • Credible interval vs. confidence interval.

Reading:

Bayesian data analysis III

Content:

  • Building Bayesian models with brms.
    • Model evaluation:
    • Visualizing and interpreting results.
    • Testing hypotheses.
    • Inference evaluation: Did things work out?
  • Some cool examples:
    • Evidence for null results.
    • Dealing with unequal variance.
    • Zero-one inflated beta binomial model.
    • Ordinal logistic regression.
    • Regression with strictly positive weights.

Resources:

Reading:

Bayesian data analysis IV

  • Content to be determined

Getting ready

Here is what you need to get ready for class.

Getting started with R

Sign up for these tools

Canvas:

  • Assignments will be posted here.
  • Use NameCoach so that we know how to pronounce your name.
    • We, the teaching team, affirm people of all gender expressions and gender identities.
    • If you prefer to be called a different name than what is indicated on the class roster, please let us know.
    • Feel free to correct us on your preferred gender pronoun.
    • If you have any questions or concerns, please do not hesitate to contact us.

Piazza

  • Forum for discussing lectures and assignments.
  • We will send you the access code via Canvas.

Datacamp

  • Free online interactive tutorials available to students in class.
  • We will send you a sign-up link that will give you free access to all of Datacamp via Canvas.

PollEverywhere

  • Used for responding to short quizzes and polls in class.
  • You will need a computer or device that enables you to respond during every class session.
  • We will help you with getting set up in the first class.
  • Our polls will be posted here: www.pollev.com/psych252

What you will learn

You will learn how to use R to …

  • read, wrangle, simulate and analyze data
  • make publication-ready plots

Understand the philosophy behind null hypothesis significance testing (NHST) and Bayesian statistics through …

  • running computer simulations and visualizing the results

Formulate research questions as statistical models and …

  • determine which models work for different situations
  • check that the model’s assumptions are met, how much it matters, and what to do if assumptions aren’t met

Communicate what you have learned about your data …

  • in short presentations in class, showcasing your visualization and analysis
  • in written reports

Contribute to open and reproducible science through …

  • adopting good coding practices
  • sharing your data and research reports online

What to expect?

In “A Vision for Stanford”, university president Marc Tessier-Lavigne states that Stanford wants to be

“an inspired, inclusive and collaborative community of diverse scholars, students and staff, where all are supported and empowered to thrive.”

Let’s try our best together in this class to make this happen!

What you can expect from me

I will …

  • start and end each class on time.
  • be there for your after class in case you have any questions.
  • be there for you during office hours.
  • not be able to provide general stats consultation. The Statistics Department provides consultations.

What I expect from you

You will …

  • get ready for the course.
  • attend the classes and participate in class discussion.
  • submit your homework assignments, midterm, and final project on time.

Resources

Readings

For many classes, there will be readings and/or accompanying online interactive tutorials. We won’t adopt a course textbook.

Course notes:

The course notes are available as an online book here.

Free online books:

Text books:

Data sets

Here are some sources for finding interesting data sets for homeworks:

Grading

  • Homework: 40%
  • Midterm: 20%
  • Final project: 40%
    • Proposal: 5%
    • Presentation: 10%
    • Report: 25%
  • Bonus:
    • Ed discussion: 2%

Policies

Please familiarize yourself with Stanford’s honor code. We will adhere to it and follow through on its penalty guidelines.

When is the weekly homework due?

Each week, we will make the homework available on Friday after class. The homework is then due on Thursday 8pm the week after.

What if I turn my homework in late?

You will have 5 slip days in total. If you return a homework within 24h after the deadline, this costs you one slip day (or 2 slip days if you return it within 48h, etc.). If you’ve use up all your slip days, late homework submissions from that point on will receive a score of 0.

Can we work in groups?

Work for the course will include both homework assignments and a final project.

  • Homework assignments: You are encouraged to work in groups. However, your writeup must be your own (both the coding as well as any written text). You will indicate who you worked with on your writeup.
  • Final project: You can either work on your own, or in a group of no more than three members. The project expectations scale with the size of the group (i.e. more is expected from a 3-person group compared to an individual project). A group will jointly write the project proposal, give the class presentation, and prepare the final report. Every member of a group will receive the same grade.

Support

Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).

Stanford is committed to ensuring that all courses are financially accessible to its students. If you require assistance with the cost of course textbooks, supplies, materials and/or fees, you should contact the Diversity & First-Gen Office (D-Gen) at opportunityfund@stanford.edu to learn about the FLIbrary and other resources they have available for support.

Stanford offers several tutoring and coaching services:

Feedback

We welcome feedback regarding the course at any point. Please feel free to email us directly, or leave anonymous feedback for the teaching team by using this form.