songbird
songbird copied to clipboard
Add explanations for what exactly "Intercept" differentials mean
This has come up before, but I'm making it an issue here so it's officially written down somewhere.
From discussion with @antgonza and many other people :) Relates to biocore/qurro#229.
Yea, I need to write up a blog post on this - that'll be up within the next 3 weeks
Was chatting with @antgonza today about formula stuff, and I found this video from one of the Patsy devs -- it does a super good job explaining both categorical encodings and intercept stuff.
A few relevant timestamps:
- Intercept stuff: around 4:45
- Explanation about why reference categories are needed in general: around 8:35
- Treatment coding stuff: around 3:12
For "normal" uses of Patsy the intercept is the mean of whatever the "reference" group is, and everything else represents differences from this mean. So e.g. in the OLS example data on the screen at around 6:40, the Intercept coefficient (group 1 reference) is 46.4583, and the group 2 coefficient is 11.5417. And when you set group 2 as the reference instead, the group 1 coefficient is -11.5417 (because things have been flipped now), and the group 2 coefficient is 58 (aka 46.4583 + 11.5417).
I'm not quite sure how this translates to an interpretation of the Intercept
differentials you get, but at the very least it'd be good to add a link to this video to the README in the future.
Thanks for raising this issue, fedarko! I had the same question.
for reference, @mortonjt has written a blog post here explaining this in the context of Songbird. We may want to add a link to this from the README.