stringi icon indicating copy to clipboard operation
stringi copied to clipboard

No warning for invalid locales

Open hadley opened this issue 2 years ago • 3 comments

stringi::stri_sort("a", opts_collator = stringi::stri_opts_collator(locale = "doesntexist"))
#> [1] "a"

Created on 2022-04-14 by the reprex package (v2.0.1)

Originally filed in https://github.com/tidyverse/stringr/issues/440

hadley avatar Apr 14 '22 22:04 hadley

As far as I remember, ICU is quite tolerant with regards to what it accepts as a valid locale id and tries hard to fall back to something closely approximating what the user needs (as per https://unicode-org.github.io/icu/userguide/locale/ and https://unicode-org.github.io/icu/userguide/locale/resources.html)

It might be a good idea to implement what you request based on what https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1Collator.html says about static Collator* icu::Collator::createInstance ( const Locale & loc, UErrorCode & err )

The UErrorCode& err parameter is used to return status information to the user. To check whether the construction succeeded or not, you should check the value of U_SUCCESS(err). If you wish more detailed information, you can check for informational error results which still indicate success. U_USING_FALLBACK_ERROR indicates that a fall back locale was used. For example, 'de_CH' was requested, but nothing was found there, so 'de' was used. U_USING_DEFAULT_ERROR indicates that the default locale data was used; neither the requested locale nor any of its fall back locales could be found.

gagolews avatar Apr 15 '22 02:04 gagolews

So the above would be an instance of U_USING_FALLBACK_WARNING or U_USING_DEFAULT_WARNING

https://unicode-org.github.io/icu/userguide/locale/resources.html

gagolews avatar Apr 15 '22 02:04 gagolews

I wonder if it's worth warning/messagining on U_USING_FALLBACK_ERROR and erroring on U_USING_DEFAULT_ERROR?

hadley avatar Apr 15 '22 12:04 hadley

U_USING_DEFAULT_WARNING when requesting a Collator and a few other services now triggers a warning on an explicitly set locale that ends up with ICU's returning a resource bundle from the root locale:

> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="C")
[1] "a"  "ą"  "c"  "ch" "h" 
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="en")
[1] "a"  "ą"  "c"  "ch" "h" 
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="pl")
[1] "a"  "ą"  "c"  "ch" "h" 
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="sk")
[1] "a"  "ą"  "c"  "h"  "ch"
> stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale="unknown")
[1] "a"  "ą"  "c"  "ch" "h" 
Warning message:
In stringi::stri_sort(c("a", "c", "ch", "h", "ą"), locale = "unknown") :
  A resource bundle lookup returned a result either from the root or the default locale.

gagolews avatar Nov 07 '23 04:11 gagolews

Thanks!

hadley avatar Nov 07 '23 12:11 hadley