scikit-posthocs icon indicating copy to clipboard operation
scikit-posthocs copied to clipboard

Dunn and missing values

Open stikpet opened this issue 1 year ago • 4 comments

I think the function for Dunn still counts as sample size the number of scores in the categorical field, even if there is no value in the numerical field, i.e. it includes missing values. I don't think this is correct....

stikpet avatar Jun 09 '24 17:06 stikpet

Handling missing values is out of scope of this package. But something should be done with it, I agree. Maybe throwing a warning will be enough... Anyway, thank you for drawing attention to it.

maximtrp avatar Jun 10 '24 07:06 maximtrp

a simple dropna() at the beginning should be enough to fix things, or indeed a warning about them.

stikpet avatar Jun 13 '24 07:06 stikpet

Dropping something silently is not a good thing. I will think about it. Maybe let's have a look at some references, bigger packages with millions of users.

maximtrp avatar Jun 13 '24 08:06 maximtrp

I'm not familiar with other packages in Python that can perform the test. In R however there is dunn.test from the library dunn.test that doesn't give any warnings and simply removes the missing values. Another R library FSA has a dunnTest function that does add a warning "Some rows deleted from 'x' and 'g' because missing data". A little program from IBM named SPSS Statistics also does not give any warnings and simply removes the missing values in the calculations.

Thanks for still answering on this and of course for sharing your library.

stikpet avatar Jun 13 '24 20:06 stikpet

Fixed in v0.10.0

maximtrp avatar Oct 20 '24 09:10 maximtrp