correlation icon indicating copy to clipboard operation
correlation copied to clipboard

Biserial and point-biserial correlations should check and probably coerce input

Open strengejacke opened this issue 3 years ago • 5 comments

This may be related to #79. When dichotomous variables are a factor, and this leads to an error, coerce to numeric before processing:

mtcars$am <- as.factor(mtcars$am)
correlation::correlation(mtcars[c("am", "hp")], method = "biserial")
#> Warning: It seems like there is not enough continuous variables in your data.
#> Maybe you want to include the factors? We're setting `include_factors=TRUE` for
#> you.
#> Error in .cor_test_biserial(data, x, y, ci = ci, method = method, ...): Biserial and point-biserial correlations can only be applied for one dichotomous and one continuous variables.

Created on 2021-04-28 by the reprex package (v2.0.0)

strengejacke avatar Apr 28 '21 14:04 strengejacke

This can be resolved if the following

https://github.com/easystats/correlation/blob/a66b185ab8a3aefb12775fbb5d1500b813f7dc0e/R/utils_clean_data.R#L5

is changed to

data <- parameters::convert_data_to_numeric(data, dummy_factors = FALSE)

But I am not sure if that is the correct solution.

The key difference of course being the factor variable is not dummy-coded:

head(parameters::convert_data_to_numeric(mtcars[c("am", "hp")]))
#>   am  hp
#> 1  1 110
#> 2  1 110
#> 3  1  93
#> 4  0 110
#> 5  0 175
#> 6  0 105

head(parameters::convert_data_to_numeric(mtcars[c("am", "hp")], FALSE))
#>   am  hp
#> 1  1 110
#> 2  1 110
#> 3  1  93
#> 4  0 110
#> 5  0 175
#> 6  0 105

Created on 2021-04-29 by the reprex package (v2.0.0)

IndrajeetPatil avatar Apr 29 '21 06:04 IndrajeetPatil

Output with the proposed solution:

mtcars$am <- as.factor(mtcars$am)
correlation::correlation(mtcars[c("am", "hp")], method = "biserial")
#> Warning: It seems like there is not enough continuous variables in your data.
#> Maybe you want to include the factors? We're setting `include_factors=TRUE` for
#> you.
#> # Correlation Matrix (biserial-method)
#> 
#> Parameter1 | Parameter2 |   rho |        95% CI | t(30) |     p
#> ---------------------------------------------------------------
#> am         |         hp | -0.30 | [-0.59, 0.05] | -1.74 | 0.092
#> 
#> p-value adjustment method: Holm (1979)
#> Observations: 32

Created on 2021-04-29 by the reprex package (v2.0.0)

IndrajeetPatil avatar Apr 29 '21 06:04 IndrajeetPatil

cc @DominiqueMakowski

IndrajeetPatil avatar Jun 23 '21 10:06 IndrajeetPatil

How do we currently handle polyserial and polychoric correlations with factors with more than 2 levels?

bwiernik avatar Jun 23 '21 13:06 bwiernik

This may be related to #79. When dichotomous variables are a factor, and this leads to an error, coerce to numeric before processing:

mtcars$am <- as.factor(mtcars$am)
correlation::correlation(mtcars[c("am", "hp")], method = "biserial")
#> Warning: It seems like there is not enough continuous variables in your data.
#> Maybe you want to include the factors? We're setting `include_factors=TRUE` for
#> you.
#> Error in .cor_test_biserial(data, x, y, ci = ci, method = method, ...): Biserial and point-biserial correlations can only be applied for one dichotomous and one continuous variables.

Created on 2021-04-28 by the reprex package (v2.0.0)

I can confirm this error and that the solution of converting the factor to numeric before calling correlation() worked for me.

mario-bermonti avatar Apr 18 '24 19:04 mario-bermonti