pymc-examples icon indicating copy to clipboard operation
pymc-examples copied to clipboard

Analysing ordinal data in PyMC

Open drbenvincent opened this issue 2 years ago • 5 comments

Notebook proposal

Title: Analysing ordinal data in PyMC

Why should this notebook be added to pymc-examples?

Ordinal outcome variables are common in many data analysis situations. Example measures include:

  • BMI: underweight, normal, overweight, obese
  • Likert scale data, eg. strongly agree, agree, neutral, disagree, strongly disagree.

Often people can be lazy in their analysis of ordinal data, and fall back to treating it as continuous.

The goal of this example is to demonstrate current best practice for ordinal regression in PyMC. In particular, it will make use of the new pm.OrderedProbit and pm.OrderedLogit distributions. ~Once #5418 is merged, then~ we can go ahead with an example notebook.

The plan is to put it in the GLM section. Current rough outline would be something like:

  • What is ordinal data?
  • Why is it crucial to analyse it properly?
  • Priors over cutpoints: This could be an involved topic, but long story short is that some constraints on the cutpoint parameters are needed (see Discussion #5055). It will probably use my proposed ConstrainedUniform distribution (see https://github.com/pymc-devs/pymcx/issues/32). We can always circle back and update this if a more polished solution presents itself.
  • Testing for group differences. E.g. response ~ group are useful for testing for differences in response distributions between groups
  • When you have a continuous predictor. E.g. response ~ continuous_predictor
  • Maybe include the combination, response ~ continuous_predictor + group if the notebook is not getting bloated, and if it seems necessary.

Related notebooks

As far as I understand there are no existing notebooks which provide examples for the analysis of ordinal data. The closest I can find is an old PyMC port of Chapter 23 of Kruschke, but that's totally independent of pymc-examples.

References

  • Liddell, T. M. & Kruschke, J. K. Analyzing ordinal data with metric models: What could possibly go wrong? J Exp Soc Psychol 79, 328–348 (2018).
  • Bürkner, P.-C. & Vuorre, M. Ordinal Regression Models in Psychology: A Tutorial. Advances in Methods and Practices in Psychological Science 42, 251524591882319–25 (2019).

drbenvincent avatar Feb 04 '22 12:02 drbenvincent

Is there anything blocking this one? I'm interested in this class of models. I couldn't see if there was still an issue with setting priors on the cut points? It seems it is possible to pass in a vector now.... Happy to pick this one if you like @drbenvincent but also conscious that you seem to have done allot of work on it already....?

NathanielF avatar Feb 25 '23 15:02 NathanielF

I initially wanted to work on it, but my plate is full at the moment. So no objections from me. No major blocker as far as I can tell.

drbenvincent avatar Mar 02 '23 17:03 drbenvincent

Cool. I'll pick it up after the longitudinal one is done.

NathanielF avatar Mar 03 '23 21:03 NathanielF

Just had a quick look at this one. It seems that even the example docstring for ordered logistic breaks now. Seems related to the shape attribute of the random variable.

image

I'm on the latest version i think: image

NathanielF avatar Mar 17 '23 21:03 NathanielF

Opened a ticket: https://github.com/pymc-devs/pymc/issues/6610

In the mean time i'll experiment a bit more with your constrained uniform function.

NathanielF avatar Mar 17 '23 22:03 NathanielF