h2o-3
h2o-3 copied to clipboard
Add Warning if User Passes Categorical Columns to h2o.cor()
We should add a warning for the python/r method h2o.cor(), that tells users the method is only intended for numeric columns, if they try to pass a categorical column.
we should also add a runit/pyunit test to test what happens if a user passes a categorical. Right now it seems that we return NA for categorical columns with more than two levels.
{code} library(h2o) h2o.init()
create a categorical column called k2 with 5 levels and 20 values
k2 = rep(c('her', 'him', 'cat', 'mouse', 'dog'),4)
create a categorical column with two levels
k = rep(c('her', 'him'),10)
#create a numeric column with 20 values n <- 20 h <- runif(n)
see what happens if you try to calculate the correlation of a numeric with a binary categorical
h2o.cor(as.h2o(k),as.h2o(h))
0.07981525
see what happens when you try to calculate the correlation of a numeric with a multi-level categorical
h2o.cor(as.h2o(k2),as.h2o(h))
NA
{code}
Lauren DiPerna commented: question originally came up here: https://stackoverflow.com/questions/53265386/how-does-h2o-cor-deal-with-categorical-data
JIRA Issue Migration Info
Jira Issue: PUBDEV-6057 Assignee: Ondrej Nekola Reporter: Lauren DiPerna State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
Can you tell me, where the code file is because I am unable to find it.
@tomasfryda have raised this PR, to close this issue, please review and merge.