skrub icon indicating copy to clipboard operation
skrub copied to clipboard

TableReport column associations and tables with few rows

Open glemaitre opened this issue 1 year ago • 3 comments

I got kind of surprise when I did the following display

import skrub
from sklearn.datasets import fetch_california_housing

skrub.patch_display()
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X.head()

image

It took some time to understand that the reason was due to the X.head() and that in this case, it was making sense.

I'm wondering if you should avoid computing all the different values when one call X.head() instead of showing the statistics on few line. It can be misleading.

An alternative is to compute the statistics on the full dataset instead even if a user request to check the .head(). However if you call .head() it might be only because you are interested of seeing the couple of first line of the dataframe without checking any other statistics.

@jeromedockes WDYT?

glemaitre avatar Dec 05 '24 18:12 glemaitre

How could the TableReport access the full dataframe if you pass .head()?

Vincent-Maladiere avatar Dec 06 '24 07:12 Vincent-Maladiere

I did not think on how it is implemented. So it seems that the most reasonable solution is to avoid computing some of the statistics when the sample size is really small < 10?

glemaitre avatar Dec 06 '24 07:12 glemaitre

not computing the associations under a certain sample size makes sense.

or we could also change the conditions under which we show the red "warning". The cramer V is an estimate of an effect size but it does not say anything about significance. by computing it we also get a chi-square statistic and thus a p-value. I wouldn't show the p-value to the user because it is not reliable, as the hypotheses of the test are not verified etc. but I guess we could still rely on it to decide if it is worth calling the user's attention to this pair of columns or not.

also, we may want to implement the bias correction of the cramer v statistic wikipedia

jeromedockes avatar Dec 06 '24 09:12 jeromedockes