ppi_py icon indicating copy to clipboard operation
ppi_py copied to clipboard

sorted_highlow=True in ppi_distribution_label_shift_ci -> form_discrete_distribution

Open justinkay opened this issue 7 months ago • 1 comments

Hi @aangelopoulos -- have finally been playing around with PPI!

I found a potential bug here: https://github.com/aangelopoulos/ppi_py/blob/ac99d9a9504d0189370e96a4e068717b18eeee2c/ppi_py/ppi.py#L1762

sorted_highlow will return the distribution ordered by class frequency, which may not be the same order as the classes in the confusion matrix. So the following line: https://github.com/aangelopoulos/ppi_py/blob/ac99d9a9504d0189370e96a4e068717b18eeee2c/ppi_py/ppi.py#L1765

will break unless the classes happen to already be ordered by frequency. In the plankton example this does not cause problems, since there are more data points with label 0 than 1. However if one were to flip this distribution, i.e.

Y = ~Y
Yhat = ~Yhat
Y_unlabeled = ~Y_unlabeled
Yhat_unlabeled = ~Yhat_unlabeled

The PPI estimate breaks.

Removing sorted_highlow=True from this call seems to fix the issue.

Happy to submit a quick PR for this if my understanding is correct here.

justinkay avatar Jul 24 '24 17:07 justinkay