h2o-3 icon indicating copy to clipboard operation
h2o-3 copied to clipboard

Add Warning if User Passes Categorical Columns to h2o.cor()

Open exalate-issue-sync[bot] opened this issue 1 year ago • 4 comments

We should add a warning for the python/r method h2o.cor(), that tells users the method is only intended for numeric columns, if they try to pass a categorical column.

we should also add a runit/pyunit test to test what happens if a user passes a categorical. Right now it seems that we return NA for categorical columns with more than two levels.

{code} library(h2o) h2o.init()

create a categorical column called k2 with 5 levels and 20 values

k2 = rep(c('her', 'him', 'cat', 'mouse', 'dog'),4)

create a categorical column with two levels

k = rep(c('her', 'him'),10)

#create a numeric column with 20 values n <- 20 h <- runif(n)

see what happens if you try to calculate the correlation of a numeric with a binary categorical

h2o.cor(as.h2o(k),as.h2o(h))

0.07981525

see what happens when you try to calculate the correlation of a numeric with a multi-level categorical

h2o.cor(as.h2o(k2),as.h2o(h))

NA

{code}

exalate-issue-sync[bot] avatar May 13 '23 04:05 exalate-issue-sync[bot]

Lauren DiPerna commented: question originally came up here: https://stackoverflow.com/questions/53265386/how-does-h2o-cor-deal-with-categorical-data

exalate-issue-sync[bot] avatar May 13 '23 04:05 exalate-issue-sync[bot]

JIRA Issue Migration Info

Jira Issue: PUBDEV-6057 Assignee: Ondrej Nekola Reporter: Lauren DiPerna State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A

hasithjp avatar May 15 '23 08:05 hasithjp

Can you tell me, where the code file is because I am unable to find it.

kru2710shna avatar Feb 10 '24 02:02 kru2710shna

@tomasfryda have raised this PR, to close this issue, please review and merge.

Devanshusisodiya avatar Feb 12 '24 03:02 Devanshusisodiya