Consolidate `coerce_to_numeric()` into `to_numeric()` via a new argument
I think the currently existing argument preserve_levels is actually most appropriate to trigger coerce_to_numeric(), but then we have to find a new name for preserve_levels.
@bwiernik @vincentarelbundock what do you think? Do you have suggestions for argument names?
Currently, the preserve_labels argument only works for factors with numeric levels. If preserve_levels = FALSE, the returned numeric vector starts with 1. If preserve_levels = TRUE, the returned numeric vector contains the factor levels, coerced to numeric.
If a vector has character or string levels, preserve_levels has no effects and always a numeric vector, starting at 1, is returned. However, sometimes you want to convert only factors with numeric levels into numeric vectors, and factors with character levels should be left unchanged. This is what coerce_to_numeric() does. However, it would be nice to integrate that function into to_numeric(), and just have a new argument.
I think the currently existing argument preserve_levels is actually most appropriate to trigger coerce_to_numeric(), but then we have to find a new name for preserve_levels.
Suggestions?
library(datawizard)
f1 <- factor(c(1, 2, 3, 4))
f2 <- factor(c(5, 6, 7 , 8))
to_numeric(f1, dummy_factors = FALSE)
#> [1] 1 2 3 4
to_numeric(f2, dummy_factors = FALSE)
#> [1] 1 2 3 4
to_numeric(f2, dummy_factors = FALSE, preserve_levels = TRUE)
#> [1] 5 6 7 8
f1 <- factor(c("a", "b", "c", "d"))
to_numeric(f1, dummy_factors = FALSE)
#> [1] 1 2 3 4
to_numeric(f1, dummy_factors = FALSE, preserve_levels = TRUE)
#> [1] 1 2 3 4
# which argument for `to_numeric()` could behave like `coerce_to_numeric()`?
coerce_to_numeric(f1)
#> [1] a b c d
#> Levels: a b c d
Created on 2022-08-05 by the reprex package (v2.0.1)
Sorry, I'm probably not the right person to ask. I'm a grumpy old man who likes base R as.numeric
😣
I'm not asking you about the functionality ;-) I'd like to know from a native speaker which argument name sounds best for this task :-) And you did your Bsc in 2007, you can't be older than me. 😝
Honestly, I think I just want to challenge the premise of the question: I think a function called to_numeric() should always return a numeric, and it feels like not great design for the output class to depend on the input class and/or arguments. If users want a character, they should call to_character()
The reason for this discussion is that we had several internal .to_numeric() functions that should be replaced by the datawizard function, to avoid spread of code and increase maintainability. However, the internal functions require different handlings. In some cases, to_numeric(factor(c("a", "b", "c", "d"))) should return a numeric, in other cases the original factor, if levels cannot be coerced to numeric.
So we need an additional argument, unless you want to maintain all our internal functions. :-)
I would argue that in those cases clarity and maintainability is best achieved by a standard as.numeric(as.character(x)) or somesuch.
Being more verbose and sticking with base R is often preferable, IMHO.
But obviously, I haven't seen all the use cases, so you know better.
And to be clear, this isn't a big deal to me; I just have difficulty to not give opinions.
I generally agree with Vincent that to_numeric() should consistently
If an argument is used to trigger this conditional behavior, it should be very explicit that this is what the argument does, and it should be independent of the preserve_levels behavior. Something like .only_numberlike (defaulting to false).
I would prefer a separate function like maybe_to_numeric(), but would be okay with an argument like .only_numberlike.
maybe_to_numeric() is the current coerce_to_numeric() (if I understood correctly). So you would opt to not consolidate the two present functions, like suggested in this issue?
I really don't understand the context where to_numeric(factor(c("a", "b", "c", "d"))) should be expected to return a, b, c, d? Why are we trying to convert to numeric and not expecting to always have numeric?
I don't know, but it's used in modelbased for example: https://github.com/search?p=2&q=org%3Aeasystats+to_numeric&type=Code
So coerce_to_numeric() is used only in one place in {easystats} ecosystem, which is in {datawizard} itself:
https://github.com/easystats/datawizard/blob/34cf6bf73cb974a5ab1c17318a4f6b34d3744752/R/data_reshape.R#L151
If we change this to something like
data[["_Row"]] <- as.numeric(row.names(data))
we can remove this function. WDYT?
Its usage in {modelbased} is covered in a separate issue: https://github.com/easystats/modelbased/issues/206
we can remove this function. WDYT?
No, make to internal instead. Else reshape_longer() no longer works.