Chapter 15 Bootstrapping

This chapter was written by Andrew Lampinen.

15.1 Learning goals

  • Why bootstrapping?
  • What is bootstrapping?
  • Bootstrap confidence intervals.
  • Bootstrap (and permutation) tests.
  • Bootstrap power analysis.

15.3 What’s wrong with parametric tests?

15.3.1 T-tests on non-normal distributions

Let’s see some examples! One common non-normal distribution is the log-normal distribution, i.e. a distribution that is normal after you take its logarithm. Many natural processes have distributions like this. One of particular interest to us is reaction times.

Let’s see how violating the assumption of normality changes the results of t.test. We’ll compare two situations

Valid: comparing two normally distributed populations with equal means but unequal variances.

Invalid: comparing two log-normally distributed populations with equal means but unequal variances.

Warning: Removed 10 rows containing non-finite values (stat_density).

Warning: Removed 10 rows containing non-finite values (stat_density).
[1] 41
[1] 25

That’s a non-trivial reduction in power from a misspecified model! (~80% to ~54%).

(Omitted because with these small sample sizes bootstrapping is problematic – permutations are better)

While the bootstrap actually loses power relative to a perfectly specified model, it is much more robust to changes in the assumptions of that model, and so it retains more power when assumptions are violated.

# this could be much more efficient
gen_data_and_norm_and_perm_test = function(num_observations_per) {
  d = data.frame(distribution=c(),
                 null_true=c(),
                 parametric=c(),
                 permutation=c()) 
    
  # normally distributed 
  ## null
  x = rnorm(num_observations_per, 0, 1.1)
  y = rnorm(num_observations_per, 0, 1.1)
  
  sig_par = t.test(x, y)$p.value < 0.05
  sig_perm = perm_mean_diff_test(x, y)
  d = bind_rows(d, 
                data.frame(distribution="Normal",
                           null_true=T,
                           parametric=sig_par,
                           permutation=sig_perm))
  
  ## non-null
  x = rnorm(num_observations_per, 0, 1.1)
  y = rnorm(num_observations_per, 1, 1.1)
  
  sig_par = t.test(x, y)$p.value < 0.05
  sig_perm = perm_mean_diff_test(x, y)
  d = bind_rows(d, 
                data.frame(distribution="Normal",
                           null_true=F,
                           parametric=sig_par,
                           permutation=sig_perm))
  
  # what if the data are log-normally distributed?
  ## null
  x = exp(rnorm(num_observations_per, 0, 1.1))
  y = exp(rnorm(num_observations_per, 0, 1.1))
  
  sig_par = t.test(x, y)$p.value < 0.05
  sig_perm = perm_mean_diff_test(x, y)
  d = bind_rows(d, 
                data.frame(distribution="Log-normal",
                           null_true=T,
                           parametric=sig_par,
                           permutation=sig_perm))
  
  ## non-null
  x = exp(rnorm(num_observations_per, 0, 1.1))
  y = exp(rnorm(num_observations_per, 1, 1.1))
  
  sig_par = t.test(x, y)$p.value < 0.05
  sig_perm = perm_mean_diff_test(x, y)
  d = bind_rows(d, 
                data.frame(distribution="Log-normal",
                           null_true=F,
                           parametric=sig_par,
                           permutation=sig_perm))
  
  return(d)
}
num_tests = 100

perm_results = replicate(num_tests, gen_data_and_norm_and_perm_test(num_observations_per),
                         simplify=F) %>%
  bind_rows()

# A tibble: 8 x 4
# Groups:   test_type, distribution [4]
  test_type   distribution null_true        pct_significant
  <chr>       <fct>        <chr>                      <dbl>
1 parametric  Normal       Alternative True            0.75
2 parametric  Normal       Null True                   0.07
3 parametric  Log-normal   Alternative True            0.53
4 parametric  Log-normal   Null True                   0.02
5 permutation Normal       Alternative True            0.76
6 permutation Normal       Null True                   0.07
7 permutation Log-normal   Alternative True            0.68
8 permutation Log-normal   Null True                   0.03

15.3.2 Non-IID noise and linear models

C.f. Anscombe’s quartet, etc.


Call:
lm(formula = DV ~ IV, data = parametric_ci_data %>% filter(type == 
    "IID Error"))

Residuals:
    Min      1Q  Median      3Q     Max 
-2.7575 -0.6600 -0.0231  0.6768  3.2956 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.03103    0.04379  -0.709    0.479    
IV           1.11977    0.07739  14.470   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9791 on 498 degrees of freedom
Multiple R-squared:  0.296, Adjusted R-squared:  0.2946 
F-statistic: 209.4 on 1 and 498 DF,  p-value: < 2.2e-16


Call:
lm(formula = DV ~ IV, data = parametric_ci_data %>% filter(type == 
    "Non-IID Error"))

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0872 -0.4030  0.0441  0.5100  3.4718 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.05623    0.04928  -1.141    0.254    
IV           1.22127    0.08708  14.024   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.102 on 498 degrees of freedom
Multiple R-squared:  0.2831,    Adjusted R-squared:  0.2817 
F-statistic: 196.7 on 1 and 498 DF,  p-value: < 2.2e-16

Challenge Q: Why isn’t the error on the intercept changed in the scaling error case?

This can result in CIs which aren’t actually at the nominal confidence level! And since CIs are equivalent to t-tests in this setting, this can also increase false positive rates. (Also equivalent to Bayesian CrIs.)

# A tibble: 8 x 5
  parameter CI_type    significant     n  prop
  <chr>     <chr>      <lgl>       <int> <dbl>
1 intercept boot       FALSE          97  0.97
2 intercept boot       TRUE            3  0.03
3 intercept parametric FALSE          97  0.97
4 intercept parametric TRUE            3  0.03
5 slope     boot       FALSE          98  0.98
6 slope     boot       TRUE            2  0.02
7 slope     parametric FALSE          88  0.88
8 slope     parametric TRUE           12  0.12

False positive rate nearly triples for the parametric model!

15.4 Bootstrap resampling

15.4.1 Demo

15.5 Applications

15.5.1 Bootstrap confidence intervals

Warning in boot.ci(bootstrap_results): bootstrap variances needed for
studentized intervals
Warning in norm.inter(t, adj.alpha): extreme order statistics used as
endpoints
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 100 bootstrap replicates

CALL : 
boot.ci(boot.out = bootstrap_results)

Intervals : 
Level      Normal              Basic         
95%   ( 86.52, 102.43 )   ( 88.40, 103.45 )  

Level     Percentile            BCa          
95%   (84.75, 99.81 )   (84.00, 99.68 )  
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
Some percentile intervals may be unstable
Warning : BCa Intervals used Extreme Quantiles
Some BCa intervals may be unstable
Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
Warning in bind_rows_(x, .id): binding character and factor vector,
coercing into character vector

Warning in bind_rows_(x, .id): binding character and factor vector,
coercing into character vector
Warning: position_dodge requires non-overlapping x intervals

Warning: position_dodge requires non-overlapping x intervals
Warning: position_dodge requires non-overlapping x intervals
Warning: position_dodge requires non-overlapping x intervals

15.5.2 Bootstrap (& permutation) hypothesis tests

Warning in boot.ci(bootstrap_results): bootstrap variances needed for
studentized intervals
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 20000 bootstrap replicates

CALL : 
boot.ci(boot.out = bootstrap_results)

Intervals : 
Level      Normal              Basic         
95%   ( 0.6940,  1.0061 )   ( 0.7000,  1.0000 )  

Level     Percentile            BCa          
95%   ( 0.70,  1.00 )   ( 0.55,  0.95 )  
Calculations and Intervals on Original Scale
Warning: Removed 2 rows containing missing values (geom_bar).

Warning: Removed 2 rows containing missing values (geom_bar).
Warning: Removed 2 rows containing missing values (geom_bar).

Warning in boot.ci(bootstrap_results, conf = 0.999): bootstrap variances
needed for studentized intervals
Warning in norm.inter(t, adj.alpha): extreme order statistics used as
endpoints
Warning: Removed 2 rows containing missing values (geom_bar).

Warning: Removed 2 rows containing missing values (geom_bar).
Warning: Removed 2 rows containing missing values (geom_bar).

15.6 Session info

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] modelr_0.1.5         extraDistr_1.8.11    ggrepel_0.8.1       
 [4] cowplot_1.0.0        tidybayes_1.1.0      greta_0.3.1.9005    
 [7] janitor_1.2.0        kableExtra_1.1.0     gapminder_0.3.0     
[10] gganimate_1.0.3      ggridges_0.5.1       ggpol_0.0.5         
[13] styler_1.1.1.9003    forcats_0.4.0        stringr_1.4.0       
[16] dplyr_0.8.3          purrr_0.3.3          readr_1.3.1         
[19] tidyr_1.0.0          tibble_2.1.3         ggplot2_3.2.1       
[22] tidyverse_1.2.1      patchwork_0.0.1.9000 boot_1.3-23         
[25] knitr_1.25          

loaded via a namespace (and not attached):
 [1] nlme_3.1-141              lubridate_1.7.4          
 [3] webshot_0.5.1             RColorBrewer_1.1-2       
 [5] progress_1.2.2            httr_1.4.1               
 [7] tools_3.6.1               backports_1.1.5          
 [9] utf8_1.1.4                R6_2.4.1                 
[11] lazyeval_0.2.2            colorspace_1.4-1         
[13] withr_2.1.2               tidyselect_0.2.5         
[15] prettyunits_1.0.2         compiler_3.6.1           
[17] cli_1.1.0                 rvest_0.3.4              
[19] arrayhelpers_1.0-20160527 xml2_1.2.2               
[21] labeling_0.3              bookdown_0.13            
[23] scales_1.0.0              tfruns_1.4               
[25] digest_0.6.22             rmarkdown_1.15           
[27] base64enc_0.1-3           pkgconfig_2.0.3          
[29] htmltools_0.3.6           rlang_0.4.1              
[31] readxl_1.3.1              rstudioapi_0.10          
[33] svUnit_0.7-12             farver_2.0.1             
[35] generics_0.0.2            jsonlite_1.6             
[37] tensorflow_2.0.0          magrittr_1.5             
[39] Matrix_1.2-17             Rcpp_1.0.3               
[41] munsell_0.5.0             fansi_0.4.0              
[43] reticulate_1.13           lifecycle_0.1.0          
[45] stringi_1.4.3             whisker_0.4              
[47] yaml_2.2.0                ggstance_0.3.3           
[49] plyr_1.8.4                grid_3.6.1               
[51] parallel_3.6.1            listenv_0.7.0            
[53] crayon_1.3.4              lattice_0.20-38          
[55] haven_2.1.1               hms_0.5.2                
[57] zeallot_0.1.0             pillar_1.4.2             
[59] reshape2_1.4.3            codetools_0.2-16         
[61] glue_1.3.1                evaluate_0.14            
[63] vctrs_0.2.0               tweenr_1.0.1             
[65] cellranger_1.1.0          gtable_0.3.0             
[67] future_1.15.0             assertthat_0.2.1         
[69] xfun_0.9                  broom_0.5.2              
[71] coda_0.19-3               viridisLite_0.3.0        
[73] globals_0.12.4            ellipsis_0.3.0