daw_json_link icon indicating copy to clipboard operation
daw_json_link copied to clipboard

Consider adopting fast_double_parser

Open lemire opened this issue 3 years ago • 4 comments

I have no investigated the float parsing in daw_json_link, but I wanted to make sure you were aware that we have packaged the fast number parsing routine from the simdjson library into its own single-header library: https://github.com/lemire/fast_double_parser This library provides exact parsing at high speed (under Linux, freeBSD, macOS, Visual Studio).

Feel free to close this issue if it is not relevant.

lemire avatar Oct 26 '20 14:10 lemire

Right now I have put off doing SIMD for the Real number parsing and have got good performance parsing all the available significant digits of the result type, parsing the exponent, then building from there as s * 10^e. This has the beauty of working at compile time. I see the other project of yours that was referenced for the from_chars interface https://github.com/lemire/fast_float probably fits better as I have no requirement of a trailing zero on the buffer being parsed.

I can try it out, but I would be unable to use it with the current license I don't think, my project is licensed under BSL1.0 and they are under Apache 2.0.

beached avatar Oct 26 '20 17:10 beached

  1. The goal of these libraries is to provide exact parsing (to the nearest float). You can, of course, parse numbers faster if you don't aim for the nearest float... but the difference is surprisingly small...

  2. Yes, fast_float and fast_double_parser are similar.

  3. I don't think that there is any SIMD involved. I would love to parse numbers faster with SIMD instructions, but, honestly, I have not found a way that is worth it yet.

  4. If you are only blocked by the licence, I will gladly fix that for you. I use Apache by default, but it is meant to be super liberal. I do not wish to block anyone.

(Note that it is entirely up to you. I just wanted you to be aware of the option.)

lemire avatar Oct 26 '20 17:10 lemire

I would like to look at using compute_float_64 for the non-constexpr code path of result types of double/float. Right now I am getting about 900-1800MB/s depending on context of the parsing with an difference from strod of usually(2/3 of the time) 0ulp, about 1/3 of the time 1ulp, and rarely 2ulp. These were tested on many runs of 1 million random floating point numbers.

Due to the nature of the JSON parser if I have to skip the number, but need to parse it later, I use a different parser with information saved from the skip. Either way, the compute_float_64 function of yours having the property of exact parsing is super useful and with a license compatible with BSL1.0 could probably help.

beached avatar Oct 26 '20 19:10 beached

@beached Added a secondary license just now (BSL1.0).

lemire avatar Oct 26 '20 20:10 lemire