pymc-examples
pymc-examples copied to clipboard
Analysing ordinal data in PyMC
Notebook proposal
Title: Analysing ordinal data in PyMC
Why should this notebook be added to pymc-examples?
Ordinal outcome variables are common in many data analysis situations. Example measures include:
- BMI: underweight, normal, overweight, obese
- Likert scale data, eg. strongly agree, agree, neutral, disagree, strongly disagree.
Often people can be lazy in their analysis of ordinal data, and fall back to treating it as continuous.
The goal of this example is to demonstrate current best practice for ordinal regression in PyMC. In particular, it will make use of the new pm.OrderedProbit
and pm.OrderedLogit
distributions. ~Once #5418 is merged, then~ we can go ahead with an example notebook.
The plan is to put it in the GLM section. Current rough outline would be something like:
- What is ordinal data?
- Why is it crucial to analyse it properly?
- Priors over cutpoints: This could be an involved topic, but long story short is that some constraints on the cutpoint parameters are needed (see Discussion #5055). It will probably use my proposed
ConstrainedUniform
distribution (see https://github.com/pymc-devs/pymcx/issues/32). We can always circle back and update this if a more polished solution presents itself. - Testing for group differences. E.g.
response ~ group
are useful for testing for differences in response distributions between groups - When you have a continuous predictor. E.g.
response ~ continuous_predictor
-
Maybe include the combination,
response ~ continuous_predictor + group
if the notebook is not getting bloated, and if it seems necessary.
Related notebooks
As far as I understand there are no existing notebooks which provide examples for the analysis of ordinal data. The closest I can find is an old PyMC port of Chapter 23 of Kruschke, but that's totally independent of pymc-examples
.
References
- Liddell, T. M. & Kruschke, J. K. Analyzing ordinal data with metric models: What could possibly go wrong? J Exp Soc Psychol 79, 328–348 (2018).
- Bürkner, P.-C. & Vuorre, M. Ordinal Regression Models in Psychology: A Tutorial. Advances in Methods and Practices in Psychological Science 42, 251524591882319–25 (2019).
Is there anything blocking this one? I'm interested in this class of models. I couldn't see if there was still an issue with setting priors on the cut points? It seems it is possible to pass in a vector now.... Happy to pick this one if you like @drbenvincent but also conscious that you seem to have done allot of work on it already....?
I initially wanted to work on it, but my plate is full at the moment. So no objections from me. No major blocker as far as I can tell.
Cool. I'll pick it up after the longitudinal one is done.
Just had a quick look at this one. It seems that even the example docstring for ordered logistic breaks now. Seems related to the shape attribute of the random variable.
data:image/s3,"s3://crabby-images/f45f9/f45f90ff4a8835bcda48fea3db1ae70a803da005" alt="image"
I'm on the latest version i think:
Opened a ticket: https://github.com/pymc-devs/pymc/issues/6610
In the mean time i'll experiment a bit more with your constrained uniform function.