stringi
stringi copied to clipboard
No warning for invalid locales
stringi::stri_sort("a", opts_collator = stringi::stri_opts_collator(locale = "doesntexist"))
#> [1] "a"
Created on 2022-04-14 by the reprex package (v2.0.1)
Originally filed in https://github.com/tidyverse/stringr/issues/440
As far as I remember, ICU is quite tolerant with regards to what it accepts as a valid locale id and tries hard to fall back to something closely approximating what the user needs (as per https://unicode-org.github.io/icu/userguide/locale/ and https://unicode-org.github.io/icu/userguide/locale/resources.html)
It might be a good idea to implement what you request based on what https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1Collator.html says about static Collator* icu::Collator::createInstance ( const Locale & loc, UErrorCode & err )
The UErrorCode& err parameter is used to return status information to the user. To check whether the construction succeeded or not, you should check the value of U_SUCCESS(err). If you wish more detailed information, you can check for informational error results which still indicate success. U_USING_FALLBACK_ERROR indicates that a fall back locale was used. For example, 'de_CH' was requested, but nothing was found there, so 'de' was used. U_USING_DEFAULT_ERROR indicates that the default locale data was used; neither the requested locale nor any of its fall back locales could be found.
So the above would be an instance of U_USING_FALLBACK_WARNING
or U_USING_DEFAULT_WARNING
https://unicode-org.github.io/icu/userguide/locale/resources.html
I wonder if it's worth warning/messagining on U_USING_FALLBACK_ERROR
and erroring on U_USING_DEFAULT_ERROR
?
U_USING_DEFAULT_WARNING
when requesting a Collator and a few other services now triggers a warning on an explicitly set locale that ends up with ICU's returning a resource bundle from the root locale:
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="C")
[1] "a" "ą" "c" "ch" "h"
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="en")
[1] "a" "ą" "c" "ch" "h"
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="pl")
[1] "a" "ą" "c" "ch" "h"
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="sk")
[1] "a" "ą" "c" "h" "ch"
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="unknown")
[1] "a" "ą" "c" "ch" "h"
Warning message:
In stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale = "unknown") :
A resource bundle lookup returned a result either from the root or the default locale.
Thanks!