corrmorant icon indicating copy to clipboard operation
corrmorant copied to clipboard

Feature Request for Shave

Open shambhu112 opened this issue 4 years ago • 1 comments

Would be great to have a function that can shave off rows and cols that are above a threshold for poorly corelated variables

i.e something like shave(min = -0.2 , max = 0.2)

this will shave off (i.e not show) variables that are corelated to any other variable in the range above

shambhu112 avatar Jun 07 '21 15:06 shambhu112

This is actually pretty easy to implement. Basically you'd have to subset the numeric columns of the dataset by something like data[ , sapply(1:ncol(data), function(i) max(abs(cor(data)[-i, i]))) > threshold] or something like that.

I am not sure if I really want to add such a feature because it does not really fit with the philosophy behind corrmorant - my idea was to provide a versatile tool for data inspection, but to make it extra complicated to use it for data dredging and p hacking. If you ever wondered why there is no build-in function to add p-values to the correlations, that's the reason (you can do it with add_funtext() but if you know enough R to find out how you probably also know why that's not a good idea).

I see why shave() may be useful, but I am not really fond of the idea that people might use the shave function to remove the variables that are not strongly correlated with anything and then publish a paper based on the reduced dataset without mentioning it.

r-link avatar Jun 09 '21 14:06 r-link