ryu
ryu copied to clipboard
Compability with ES6 / V8
I'm working with something that possibly will become an IETF standard: https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-02 It depends on number formatting compatible with ES6/V8.
A short test revealed that the Java version of Ryu is not entirely compatible with this scheme. x0000000000000001 returns 5e-324 in ES6/V8 while 4.9e-324 in Ryu
This is not really a bug but could be worth knowing.
https://github.com/cyberphone/json-canonicalization/tree/master/testdata#es6-numbers
@cyberphone btw, sorting of properties during serialization of JSON is a quite inefficient approach, that can eat all gain of using the great Ryu algorithm.
@plokhotnyuk well, this "bug" report only deals with the mathematical differences between Ryu and other algorithms including the one in Python, ES6, and Go. I would be a bit concerned if the Go folks swapped their current (partly buggy) algorithm for Ryu. https://github.com/golang/go/issues/29491#issuecomment-453807516
Property sorting is unfortunately unavoidable for obtaining a canonical form.
The Java implementation attempts to follow the specification of Java's Double.toString [1], which says that it must return at least two digits. Out of all two-digit numbers, 4.9e-324 is closest to the exact floating point value in this case, so that's what it returns. The problem described in https://github.com/ulfjack/ryu/issues/83 is that it sometimes doesn't return the closest two-digit number because there's a one-digit number that's shorter. The C implementation doesn't do that, and I have not found any differences between it and Grisu over a fairly large number of tests.
It's unfortunate that Java's spec differs from everyone else in this regard, but what can you do. You could easily add a flag to allow the Java impl to return the other value instead.
[1] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#toString(double)
There must be at least one digit to represent the fractional part
Thanks for bringing this up. I added a note to the README which hopefully clarifies the current situation. I'll leave this open as a feature request to add a flag to the Java implementation to generate the shortest output rather than outputting at least two digits.
@ulfjack Thanx for the explanation! The 100M test suite fully validated with the Go implementation here: https://github.com/remyoudompheng/go/tree/ryu
Since ES6 JSON number formatting is quite different (-0 is equal to 0, "g" with specific range, forbidden Nan/Infinity), I guess making a dedicated port of your Java code is my best option if I wanted to achieve the same performance improvement as I experienced in Go? I did a port of another library for .NET which also could use an update: https://github.com/cyberphone/json-canonicalization/tree/master/dotnet/es6numberserializer
Did a dedicated port in 3 hours and it worked like a charm! Will soon replace the original and pretty complex V8-port done by Mozilla. Great work BTW!
I will refer to Ryu in the next I-D revision as a recommend algorithm for JSON/JCS serialization.
https://github.com/cyberphone/json-canonicalization/blob/master/java/canonicalizer/src/org/webpki/jcs/NumberToJSON.java
Cool, thanks!
Is ryu moving forward on being ECMAScript specification compliant https://tc39.es/ecma262/#sec-numeric-types-number-tostring or?
@croraf that would be cool because then RYU would be compatible with RFC 8785 (JSON Canonicalization Scheme) that will be published soon (it is the RFC editors' queue). https://www.rfc-editor.org/authors/rfc8785.htm
I was hoping to provide a low-level API that could be used to generate multiple formats, but at least in the C version, that has a measurable performance cost, which - so far - I haven't been willing to pay. That said, for the C side, adding a compile-time symbol to select between formats (libstdc++ / JSON) certainly seems reasonable. Java doesn't really have compile-time settings, so it would have to be a runtime setting.
Unfortunately, I have been entirely preoccupied with running my own company, with precious little time to dedicate to Ryu. :-/
@ulfjack perhaps the community can do the heavy-lifting, the important is your decision on the direction to take.
I want the code to be fast, and also widely useable. It's been integrated into some C++ standard libraries, so it needs - at least - to be configurable such that it's compatible with that. As I said, adding a compile-time symbol to the C version seems fine to me.