janitor icon indicating copy to clipboard operation
janitor copied to clipboard

feature suggestion: parameter to specify decimal separator on adorn_pct_formatting

Open mgacc0 opened this issue 2 years ago • 7 comments

Would you consider of interest adding a parameter "decimal.mark" (or similar) to adorn_pct_formatting?

mgacc0 avatar Jul 30 '21 12:07 mgacc0

Yes, that sounds like a good idea. This may be a blind spot for me as an American - is this for countries where what I know as 99.1% would be written 99,1% ?

sfirke avatar Aug 02 '21 14:08 sfirke

I'm from Europe. Internally (on scripts and storage) I often use decimal point. But for reports and output sometimes it's a requirement to use decimal comma. There are a lot of countries that use decimal comma: https://en.wikipedia.org/wiki/File:DecimalSeparator.svg

mgacc0 avatar Aug 02 '21 21:08 mgacc0

Whoa that is a lot indeed! Yes, let's add this. I am not sure when I'll be able to implement it. I'd accept a pull request if someone wants to add it, or if none comes, then I'll hopefully add it down the road.

I'm trying to think through the implementation...

  • Are there other functions I'd need to modify? I don't think so...
  • Are there other packages that already do this? I could study them, borrow from them, or maybe call them directly

sfirke avatar Aug 07 '21 20:08 sfirke

adorn_pct_formatting converts numeric to string. So it should be the only one where it's needed.

As a reference: number from scales package https://github.com/r-lib/scales/blob/master/R/label-number.r internally calls the base::format function.

mgacc0 avatar Aug 08 '21 19:08 mgacc0

Just a thought here that I haven't looked into due to lack of time, but isn't this a locale setting for R? Perhaps the function could look to that for info, rather than requiring user input?

jzadra avatar Aug 09 '21 16:08 jzadra

Use getOption("OutDec") as the default value for decimal.mark parameter:

adorn_pct_formatting <- function(dat, digits = 1, decimal.mark = getOption("OutDec"), rounding = "half to even", affix_sign = TRUE, ...)

mgacc0 avatar Aug 11 '21 09:08 mgacc0

Ooh nice. Okay I can take a shot at this. Then when I get it on a branch, I'll post here and would love for someone in a comma-using locale to try it out.

sfirke avatar Aug 11 '21 14:08 sfirke

@mgacc0 I have a draft of this as https://github.com/sfirke/janitor/pull/510. Can you take a look please? Hoping to finalize it in the next week or so to have it ready to go CRAN.

I ended up not adding a new argument. Instead it pulls from a user's locale, the value of getOption("OutDec"). There's a cost to adding a new argument, in that it breaks code that relied on column names being the 5th variable, like this test: https://github.com/sfirke/janitor/blob/main/tests/testthat/test-adorn-pct-formatting.R#L164. I was encouraging people to use the ,,, paradigm to get to the tidyselect cols argument and this would break that pattern.

My thinking was that using the user's locale would do the job virtually all the time, and in rare cases where it doesn't, they can set options(OutDec= ",") before calling the function.

sfirke avatar Jan 05 '23 02:01 sfirke

Also @YonghuiDong I see you gave 👍 above, please share if you have any thoughts on the PR linked above

sfirke avatar Jan 05 '23 02:01 sfirke

I haven't heard from any potential user of this feature. I'm going to merge to main branch so it's easier for people to install the dev version, then put out a call on Mastodon and see if I can find anyone to give feedback.

sfirke avatar Jan 10 '23 17:01 sfirke

I am sorry to report that the feature does not seem to be working - when using the Czech locale (we are one of the European troublemakers) I still get the dot as decimal separator; see the output below.

Also, on a personal note: it is very common practice for R users from non-English speaking countries to have set English locale as default. There are many reasons for that - such as that you don't really want to google for error messages in some marginal language such as mine, when there are troves of English resources just lying around. Also it is not the bestest of practice to have commas as decimal marks in your actual code; you want dots to make it universal.

As a consequence going by the decimal separator from locale can be unreliable. Many of the target users will have their development environment in English, but will be producing locally flavored output.

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] Czech_Czechia.1250
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] janitor_2.1.0.9000 dplyr_1.0.10      

loaded via a namespace (and not attached):
 [1] rstudioapi_0.14  magrittr_2.0.3   tidyselect_1.2.0 timechange_0.1.1 R6_2.5.1         rlang_1.0.6     
 [7] fansi_1.0.3      stringr_1.5.0    tools_4.2.0      utf8_1.2.2       cli_3.6.0        DBI_1.1.3       
[13] withr_2.5.0      ellipsis_0.3.2   assertthat_0.2.1 tibble_3.1.8     lifecycle_1.0.3  purrr_1.0.0     
[19] tidyr_1.2.1      vctrs_0.5.1      glue_1.6.2       snakecase_0.11.0 stringi_1.7.8    compiler_4.2.0  
[25] pillar_1.8.1     generics_0.1.3   lubridate_1.9.0  pkgconfig_2.0.3 


> mtcars %>%
+   tabyl(am, cyl) %>%
+   adorn_percentages("col") %>%
+   adorn_pct_formatting()
 am     4     6     8
  0 27.3% 57.1% 85.7%
  1 72.7% 42.9% 14.3%

jlacko avatar Jan 11 '23 17:01 jlacko

Thank you @jlacko for the user perspective! To check something: would it be a viable workflow for a user who wanted commas in their output tabyl but not the rest of the time to set options(OutDec= ",") before printing their tabyls?

Based on your feedback I'm thinking there should be an explicit argument in the adorn_pct_formatting function that allows for decimal mark control. Do you/others think that the default value should be . or the locale-specific getOption("OutDec") ?

I don't understand why the locale marker is not getting picked up in your case, but maybe it doesn't matter if that workflow is undesirable anyway.

sfirke avatar Jan 11 '23 18:01 sfirke

Having thought about the issue a bit I believe the approach using OutDec option makes great sense.

For two reasons:

  • the use case for comma as decimal mark is not on actual code, but on generated output - documents in R markdown & now Quarto. In generating markdown & quarto documents is common practice to set options for stuff like {knitr} in the special setup chunk. So there is precedent.
  • the option(OutDec=",") guides not only janitor::adorn_pct_formatting() behavior specifically, but the decimal separator generally; such as when called via base::print(). So setting it once at document level should lead to a consistent behavior across all printing functions.

I confess I am liking the approach the more I think about it...

jlacko avatar Jan 11 '23 20:01 jlacko

Alright, so I'd encourage use of option(OutDec) regardless of locale it sounds like. And I guess in a case like yours where it isn't getting picked up via the locale setting, that's fine (nothing to be done about it) and you could just control it via that parameter?

In that case I could leave the current code as-is and I would just need to add to the function documentation that users can directly control the printing with options(OutDec= ",") at the .Rprofile, file, or chunk level. How does that sound?

sfirke avatar Jan 12 '23 02:01 sfirke

Yes, that is what I would do. And in encouraging the use of option(OutDec) in package documentation please mention that this setting also guides the behavior of base print, so both are aligned using one parameter - which is kind of neat, if you think of it :)

jlacko avatar Jan 12 '23 06:01 jlacko

I'll implement the documentation in a separate issue, to reflect that the code-writing part of this is done: https://github.com/sfirke/janitor/issues/520

sfirke avatar Jan 30 '23 16:01 sfirke