ryu icon indicating copy to clipboard operation
ryu copied to clipboard

Compability with ES6 / V8

Open cyberphone opened this issue 6 years ago • 12 comments
trafficstars

I'm working with something that possibly will become an IETF standard: https://tools.ietf.org/html/draft-rundgren-json-canonicalization-scheme-02 It depends on number formatting compatible with ES6/V8.

A short test revealed that the Java version of Ryu is not entirely compatible with this scheme. x0000000000000001 returns 5e-324 in ES6/V8 while 4.9e-324 in Ryu

This is not really a bug but could be worth knowing.

https://github.com/cyberphone/json-canonicalization/tree/master/testdata#es6-numbers

cyberphone avatar Jan 13 '19 17:01 cyberphone

@cyberphone btw, sorting of properties during serialization of JSON is a quite inefficient approach, that can eat all gain of using the great Ryu algorithm.

plokhotnyuk avatar Jan 13 '19 18:01 plokhotnyuk

@plokhotnyuk well, this "bug" report only deals with the mathematical differences between Ryu and other algorithms including the one in Python, ES6, and Go. I would be a bit concerned if the Go folks swapped their current (partly buggy) algorithm for Ryu. https://github.com/golang/go/issues/29491#issuecomment-453807516

Property sorting is unfortunately unavoidable for obtaining a canonical form.

cyberphone avatar Jan 13 '19 19:01 cyberphone

The Java implementation attempts to follow the specification of Java's Double.toString [1], which says that it must return at least two digits. Out of all two-digit numbers, 4.9e-324 is closest to the exact floating point value in this case, so that's what it returns. The problem described in https://github.com/ulfjack/ryu/issues/83 is that it sometimes doesn't return the closest two-digit number because there's a one-digit number that's shorter. The C implementation doesn't do that, and I have not found any differences between it and Grisu over a fairly large number of tests.

It's unfortunate that Java's spec differs from everyone else in this regard, but what can you do. You could easily add a flag to allow the Java impl to return the other value instead.

[1] https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#toString(double)

There must be at least one digit to represent the fractional part

ulfjack avatar Jan 14 '19 12:01 ulfjack

Thanks for bringing this up. I added a note to the README which hopefully clarifies the current situation. I'll leave this open as a feature request to add a flag to the Java implementation to generate the shortest output rather than outputting at least two digits.

ulfjack avatar Jan 14 '19 13:01 ulfjack

@ulfjack Thanx for the explanation! The 100M test suite fully validated with the Go implementation here: https://github.com/remyoudompheng/go/tree/ryu

Since ES6 JSON number formatting is quite different (-0 is equal to 0, "g" with specific range, forbidden Nan/Infinity), I guess making a dedicated port of your Java code is my best option if I wanted to achieve the same performance improvement as I experienced in Go? I did a port of another library for .NET which also could use an update: https://github.com/cyberphone/json-canonicalization/tree/master/dotnet/es6numberserializer

cyberphone avatar Jan 17 '19 05:01 cyberphone

Did a dedicated port in 3 hours and it worked like a charm! Will soon replace the original and pretty complex V8-port done by Mozilla. Great work BTW!

I will refer to Ryu in the next I-D revision as a recommend algorithm for JSON/JCS serialization.

https://github.com/cyberphone/json-canonicalization/blob/master/java/canonicalizer/src/org/webpki/jcs/NumberToJSON.java

cyberphone avatar Jan 17 '19 17:01 cyberphone

Cool, thanks!

ulfjack avatar Jan 17 '19 20:01 ulfjack

Is ryu moving forward on being ECMAScript specification compliant https://tc39.es/ecma262/#sec-numeric-types-number-tostring or?

croraf avatar Jun 18 '20 20:06 croraf

@croraf that would be cool because then RYU would be compatible with RFC 8785 (JSON Canonicalization Scheme) that will be published soon (it is the RFC editors' queue). https://www.rfc-editor.org/authors/rfc8785.htm

cyberphone avatar Jun 19 '20 04:06 cyberphone

I was hoping to provide a low-level API that could be used to generate multiple formats, but at least in the C version, that has a measurable performance cost, which - so far - I haven't been willing to pay. That said, for the C side, adding a compile-time symbol to select between formats (libstdc++ / JSON) certainly seems reasonable. Java doesn't really have compile-time settings, so it would have to be a runtime setting.

Unfortunately, I have been entirely preoccupied with running my own company, with precious little time to dedicate to Ryu. :-/

ulfjack avatar Jun 19 '20 09:06 ulfjack

@ulfjack perhaps the community can do the heavy-lifting, the important is your decision on the direction to take.

croraf avatar Jun 19 '20 10:06 croraf

I want the code to be fast, and also widely useable. It's been integrated into some C++ standard libraries, so it needs - at least - to be configurable such that it's compatible with that. As I said, adding a compile-time symbol to the C version seems fine to me.

ulfjack avatar Jun 24 '20 20:06 ulfjack