correlation
correlation copied to clipboard
Biserial and point-biserial correlations should check and probably coerce input
This may be related to #79. When dichotomous variables are a factor, and this leads to an error, coerce to numeric before processing:
mtcars$am <- as.factor(mtcars$am)
correlation::correlation(mtcars[c("am", "hp")], method = "biserial")
#> Warning: It seems like there is not enough continuous variables in your data.
#> Maybe you want to include the factors? We're setting `include_factors=TRUE` for
#> you.
#> Error in .cor_test_biserial(data, x, y, ci = ci, method = method, ...): Biserial and point-biserial correlations can only be applied for one dichotomous and one continuous variables.
Created on 2021-04-28 by the reprex package (v2.0.0)
This can be resolved if the following
https://github.com/easystats/correlation/blob/a66b185ab8a3aefb12775fbb5d1500b813f7dc0e/R/utils_clean_data.R#L5
is changed to
data <- parameters::convert_data_to_numeric(data, dummy_factors = FALSE)
But I am not sure if that is the correct solution.
The key difference of course being the factor variable is not dummy-coded:
head(parameters::convert_data_to_numeric(mtcars[c("am", "hp")]))
#> am hp
#> 1 1 110
#> 2 1 110
#> 3 1 93
#> 4 0 110
#> 5 0 175
#> 6 0 105
head(parameters::convert_data_to_numeric(mtcars[c("am", "hp")], FALSE))
#> am hp
#> 1 1 110
#> 2 1 110
#> 3 1 93
#> 4 0 110
#> 5 0 175
#> 6 0 105
Created on 2021-04-29 by the reprex package (v2.0.0)
Output with the proposed solution:
mtcars$am <- as.factor(mtcars$am)
correlation::correlation(mtcars[c("am", "hp")], method = "biserial")
#> Warning: It seems like there is not enough continuous variables in your data.
#> Maybe you want to include the factors? We're setting `include_factors=TRUE` for
#> you.
#> # Correlation Matrix (biserial-method)
#>
#> Parameter1 | Parameter2 | rho | 95% CI | t(30) | p
#> ---------------------------------------------------------------
#> am | hp | -0.30 | [-0.59, 0.05] | -1.74 | 0.092
#>
#> p-value adjustment method: Holm (1979)
#> Observations: 32
Created on 2021-04-29 by the reprex package (v2.0.0)
cc @DominiqueMakowski
How do we currently handle polyserial and polychoric correlations with factors with more than 2 levels?
This may be related to #79. When dichotomous variables are a factor, and this leads to an error, coerce to numeric before processing:
mtcars$am <- as.factor(mtcars$am) correlation::correlation(mtcars[c("am", "hp")], method = "biserial") #> Warning: It seems like there is not enough continuous variables in your data. #> Maybe you want to include the factors? We're setting `include_factors=TRUE` for #> you. #> Error in .cor_test_biserial(data, x, y, ci = ci, method = method, ...): Biserial and point-biserial correlations can only be applied for one dichotomous and one continuous variables.
Created on 2021-04-28 by the reprex package (v2.0.0)
I can confirm this error and that the solution of converting the factor to numeric before calling correlation()
worked for me.