Functions that depend on the current C locale
Some LibC funs, like strtod in String#to_f64?, and snprintf in Float::Printer#internal, depend on the currently active C locale. This means some oddities could happen if a different locale is active:
lib LibC
LC_ALL = 6
fun setlocale(category : Int, locale : Char*) : Char*
end
"1,23".to_f64? # => nil
1.23.to_s # => "1.23"
1.23.to_s.to_f # => 1.23
"%g" % 1.23 # => "1.23" # `String::Formatter#float` also uses `snprintf`
1e23.to_s # => "9.9999999999999992e+22" # Grisu failure case, triggers the `snprintf` path
# the decimal and thousands separators in german are swapped
LibC.setlocale(LibC::LC_ALL, "de_DE.UTF-8")
"1,23".to_f64? # => 1.23
1.23.to_s # => "1.23"
1.23.to_s.to_f # Invalid Float64: "1.23" (ArgumentError)
"%g" % 1.23 # => "1,23"
1e23.to_s # => "9,9999999999999992e+22"
All programs start with the C C locale, but third-party shards might nonetheless change it, leading to those hard to debug scenarios. To my understanding the entire Crystal standard library should be locale-independent.
Is there anything we could do here apart from reimplementing all of LibC's locale-dependent functions in Crystal? (There are probably other kinds of global state in the C runtime to avoid too.)
Here is a list of non-Windows functions I could find that depend on the C locale:
- [x]
dprintf: Affects floating-point formatting specifiers (%a%e%f%g). Appears inCrystal::System.print_error, which is used in many places but does not have any floats. Locale-dependent behavior might affect compiler specs but I don't think there is a big need to replace this one. - [x]
printf: Similar to above. Appears when failing to raise an exception and when the GC outputs a warning; formatting specifiers are not used at all. (dprintfis probably more suitable here, as those errors really should belong inSTDERRrather thanSTDOUT.) - [x]
snprintf: Similar to above. Sees the most uses:Crystal::System.print_erroron WindowsFloat::Printer#internal, when Grisu3 fails. Indirectly affectsFloat32#to_sandFloat64#to_s. #10913 removes this usage.String::Formatter#float, whenever a floating-point formatting specifier is used. The PR above does not touch this yet (even then it would cover only%fand some cases of%g).
- [ ]
strtod,strtof: AffectsString#to_f32?and#to_f64?respectively. As a result of using these functions, conversion from hexfloats is actually possible:
Some numeric specs in the standard library use a custom hexfloat parser instead of this undocumented feature."0xa.bp+5".to_f # => 342.0 - [ ]
strerror: Appears inErrno#message. Might affect specs, and is also a public API, unlikedprintf.
On Windows it has been noted that FormatMessageW, which appears in WinError#message, is also locale-dependent.
It is at least questionable that "1,23".to_f could ever succeed just based on the current user's locale, without explicitly asking for this, which is something the developer might never have expected.
So for anyone else affected by this like me, LibC.setlocale(LibC::LC_ALL, "en_US.UTF-8") (code from starting post) somehow did not fix this. Instead, this is what made it:
fun main(argc : Int32, argv : UInt8**) : Int32
LibC.setenv("LC_ALL", "en_US.UTF-8", 1)
Crystal.main(argc, argv)
end
so essentially just overwriting the users locale.
Some libc implementations have implementations liks strtod_l that allow specifying a locale, but they're missing on many platforms. So it wouldn't be a real solution.
I suppose the best solution is to move to native implementations for locale-based algorithms.