oiLabs-base-R icon indicating copy to clipboard operation
oiLabs-base-R copied to clipboard

new questions or data set in "intro to data" lab

Open beanumber opened this issue 10 years ago • 6 comments

Could we replace the body weight data set from the Intro to Data lab with something else? Or at least change the questions? As it stands, it strikes me as one of those surreptitious things that makes women feel bad about their bodies for no reason.

beanumber avatar Oct 02 '15 12:10 beanumber

Agreed. Should we use the nycflights13 data? It's a good one for a lab that doesn't involve inference.

mine-cetinkaya-rundel avatar Oct 02 '15 15:10 mine-cetinkaya-rundel

You'd have to change the Normal distribution lab as well. And I feel the data set is fine, just choose a different outcome variable perhaps.

norcalbiostat avatar Oct 02 '15 15:10 norcalbiostat

@norcalbiostat we could use different datasets for the two labs though, so I don't think we need to feel limited to variables that are normally distributed for the intro to data lab.

mine-cetinkaya-rundel avatar Oct 02 '15 16:10 mine-cetinkaya-rundel

But I think @norcalbiostat 's point is the the body dimensions data in the Normal Distribution lab has the same problem.

beanumber avatar Oct 02 '15 16:10 beanumber

I should admit I haven't used the normal distribution lab in a while, so I should first correct myself - the two labs don't use the same dataset anyway.

I feel like the issue with the intro to data lab is the wdiff variable, that we then compare between men and women. The normal distribution lab compares heights, briefly, but beyond that doesn't go into comparing peoples' desired weights, so perhaps it's a bit more factual and bit less about body image?

I'm completely on board with changing the dataset for the intro to data lab, as I think that lab can be enhanced to be more about data wrangling skills (in addition to resolving the issue @beanumber raised). And I'm also on board with changing the data in the normal distribution lab because it's not that exciting (likely the reason why I haven't been doing that lab lately...). But if we're prioritizing, it seems like intro to data lab might have a more urgent issue to be addressed.

mine-cetinkaya-rundel avatar Oct 02 '15 17:10 mine-cetinkaya-rundel

I'm all for refreshing data sets, but the challenge is always finding a replacement that is better. And there's often that unfortunate trade-off between data that clearly illustrate a statistical principle and data that is most interesting (please oh please, let us find a population level data set so we can replace the ames data).

I think a data wrangling lab based on the nycflights13 would be terrific. It has heterogeneous data types and is interesting enough to naturally motivate several different questions and analyses. It also has that nice opportunity to define on-time performance in multiple ways, so it's an improvement on wdiff that way. If this lab were to replace lab 1, it's important that it cover some of the key points of chapter 1. It could also be cool to have it go off on it's own data sciency direction, but then it's probably work best as an extra lab.

If I remember correctly, the main thing in favor of the bdims data set is that it's a collection of continuous variables that exhibit a mix of symmetric and skewed distributions. I think we should keep our eyes out for a more interesting replacement, but I have nothing on hand right now.

andrewpbray avatar Oct 02 '15 18:10 andrewpbray