crystal icon indicating copy to clipboard operation
crystal copied to clipboard

Functions that depend on the current C locale

Open HertzDevil opened this issue 3 years ago • 4 comments

Some LibC funs, like strtod in String#to_f64?, and snprintf in Float::Printer#internal, depend on the currently active C locale. This means some oddities could happen if a different locale is active:

lib LibC
  LC_ALL = 6

  fun setlocale(category : Int, locale : Char*) : Char*
end

"1,23".to_f64? # => nil
1.23.to_s      # => "1.23"
1.23.to_s.to_f # => 1.23
"%g" % 1.23    # => "1.23"                   # `String::Formatter#float` also uses `snprintf`
1e23.to_s      # => "9.9999999999999992e+22" # Grisu failure case, triggers the `snprintf` path

# the decimal and thousands separators in german are swapped
LibC.setlocale(LibC::LC_ALL, "de_DE.UTF-8")

"1,23".to_f64? # => 1.23
1.23.to_s      # => "1.23"
1.23.to_s.to_f # Invalid Float64: "1.23" (ArgumentError)
"%g" % 1.23    # => "1,23"
1e23.to_s      # => "9,9999999999999992e+22"

All programs start with the C C locale, but third-party shards might nonetheless change it, leading to those hard to debug scenarios. To my understanding the entire Crystal standard library should be locale-independent.

Is there anything we could do here apart from reimplementing all of LibC's locale-dependent functions in Crystal? (There are probably other kinds of global state in the C runtime to avoid too.)

HertzDevil avatar Mar 30 '22 11:03 HertzDevil

Here is a list of non-Windows functions I could find that depend on the C locale:

  • [x] dprintf: Affects floating-point formatting specifiers (%a %e %f %g). Appears in Crystal::System.print_error, which is used in many places but does not have any floats. Locale-dependent behavior might affect compiler specs but I don't think there is a big need to replace this one.
  • [x] printf: Similar to above. Appears when failing to raise an exception and when the GC outputs a warning; formatting specifiers are not used at all. (dprintf is probably more suitable here, as those errors really should belong in STDERR rather than STDOUT.)
  • [x] snprintf: Similar to above. Sees the most uses:
    • Crystal::System.print_error on Windows
    • Float::Printer#internal, when Grisu3 fails. Indirectly affects Float32#to_s and Float64#to_s. #10913 removes this usage.
    • String::Formatter#float, whenever a floating-point formatting specifier is used. The PR above does not touch this yet (even then it would cover only %f and some cases of %g).
  • [ ] strtod, strtof: Affects String#to_f32? and #to_f64? respectively. As a result of using these functions, conversion from hexfloats is actually possible:
    "0xa.bp+5".to_f # => 342.0
    
    Some numeric specs in the standard library use a custom hexfloat parser instead of this undocumented feature.
  • [ ] strerror: Appears in Errno#message. Might affect specs, and is also a public API, unlike dprintf.

On Windows it has been noted that FormatMessageW, which appears in WinError#message, is also locale-dependent.

HertzDevil avatar Mar 31 '22 12:03 HertzDevil

It is at least questionable that "1,23".to_f could ever succeed just based on the current user's locale, without explicitly asking for this, which is something the developer might never have expected.

So for anyone else affected by this like me, LibC.setlocale(LibC::LC_ALL, "en_US.UTF-8") (code from starting post) somehow did not fix this. Instead, this is what made it:

fun main(argc : Int32, argv : UInt8**) : Int32
	LibC.setenv("LC_ALL", "en_US.UTF-8", 1)
	Crystal.main(argc, argv)
end

so essentially just overwriting the users locale.

phil294 avatar Aug 16 '22 17:08 phil294

Some libc implementations have implementations liks strtod_l that allow specifying a locale, but they're missing on many platforms. So it wouldn't be a real solution.

I suppose the best solution is to move to native implementations for locale-based algorithms.

straight-shoota avatar Oct 06 '23 12:10 straight-shoota

s2d.c and s2f.c from the Ryu repository seem to contain replacements for strtod and strtof. I do not know whether they have their own names, or whether they are indeed somehow different from other C runtimes' implementations apart from locale support

HertzDevil avatar Dec 19 '23 17:12 HertzDevil