ppi_py
ppi_py copied to clipboard
sorted_highlow=True in ppi_distribution_label_shift_ci -> form_discrete_distribution
Hi @aangelopoulos -- have finally been playing around with PPI!
I found a potential bug here: https://github.com/aangelopoulos/ppi_py/blob/ac99d9a9504d0189370e96a4e068717b18eeee2c/ppi_py/ppi.py#L1762
sorted_highlow
will return the distribution ordered by class frequency, which may not be the same order as the classes in the confusion matrix. So the following line: https://github.com/aangelopoulos/ppi_py/blob/ac99d9a9504d0189370e96a4e068717b18eeee2c/ppi_py/ppi.py#L1765
will break unless the classes happen to already be ordered by frequency. In the plankton example this does not cause problems, since there are more data points with label 0 than 1. However if one were to flip this distribution, i.e.
Y = ~Y
Yhat = ~Yhat
Y_unlabeled = ~Y_unlabeled
Yhat_unlabeled = ~Yhat_unlabeled
The PPI estimate breaks.
Removing sorted_highlow=True
from this call seems to fix the issue.
Happy to submit a quick PR for this if my understanding is correct here.