Blog-Posts icon indicating copy to clipboard operation
Blog-Posts copied to clipboard

Bayesian bootstrap is not more precice after accounting for oversampling

Open pmbaumgartner opened this issue 2 years ago • 1 comments

Hey Matteo -

Thank you for your blog post on the Bayesian boostrap! I've found it quite helpful in adapting to my own problems and gaining a better understanding of the differences between bayesian and classic bootstrap.

I was trying to replicate your analysis by rewriting some of the code, and I noticed that in the two-level sampling part of your blog, you oversample from the dataframe 10x (cell 19). This is the reason you get a more precise / narrow posterior distribution, not just the use of the bayesian boostrap. You can check this yourself by oversampling in your classic bootstrap procedure, which results in this:

image

Within the wider context of the blog post, I think you do need to oversample to account for the rare events cases you describe later in the blog post. If you don't oversample, you're going to have instances of sampling where you won't get the rare event. You could try this for yourself with a regression that's unable to additionally take weights and would require the two-level sampling procedure. This would also result in instances where you might not be able to fit the model (since it is actually resampling) or end up with parameter estimates at extreme values.

pmbaumgartner avatar Nov 07 '22 18:11 pmbaumgartner

Hi Peter,

thanks for spotting it! The function twolv_boot is indeed oversampling. I will correct that. However, in the rest of the article, I am not using that function, therefore I should not be oversampling.

Let me know if I missed something. And thanks again! Matteo

matteocourthoud avatar Nov 10 '22 07:11 matteocourthoud