simdjson icon indicating copy to clipboard operation
simdjson copied to clipboard

get_float() to retrive 4byte float missing.

Open hidalgoss opened this issue 2 years ago • 5 comments

Hi folks!

We need to retrieve a lot of float information from JSON files but the only get() helper we found in your lib is the get_double() which returns an 8byte double variable. This is a serious performance issue due to conversion we need to do to float value.

¿Can you help use about? ¿Can you please guide us and tell if we have some f(x) to get 4byte float in your lib?

Thanks.

hidalgoss avatar Feb 24 '22 17:02 hidalgoss

This is a serious performance issue due to conversion we need to do to float value

Parsing a string into a float requires hundreds of instructions. Conversion from double to float is a single instruction on most systems. The cost is comparable to an additional multiplication. So it is unlikely to make a measurable difference.

There are other good reasons to add a get_float() function, however. It is a valid issue.

Thanks.

lemire avatar Feb 24 '22 19:02 lemire

Furthermore, a get_float() function should give an error when the value is too large (e.g., 1e300).

lemire avatar Feb 24 '22 19:02 lemire

Thanks a lot for your quick replly! I'm not sure I get meaning about your reply.

mmm when you say string to float requires hundreds of instructions, Do you refer to some internal procedure you use which allows you to convert to double more efficiently than float? For us, this layer should be abstract.

In terms of high performance, for us, if we retrieve from a high loaded json file of float values, we need to std::static_cast every double we retrieve thus, the performance in AI applications who uses this type of high loaded files have a very high impact. Is not trivial and really high measurable. Besides, you have a really great error & exception handling mechanism which helps when read is done incorrectly for values greater than 4bytes as instance, in this case you suggest.

If you consider to add get_float(), Can you please let some idea about when you can have this feature ready to be used? For us will be very nice to know as much detailed schedule as you can. :)

Again, thanks a lot for your support.

hidalgoss avatar Feb 24 '22 19:02 hidalgoss

You should be able to convert doubles into floats at tens of gigabytes per second. It is essentially free compared to anything else you might be doing when ingesting JSON files.

We have no timeline at the moment, but if you'd like to sponsor this feature with funding, we could do it faster.

lemire avatar Feb 24 '22 20:02 lemire

In the following blog post, I make the point that it is unlikely that the conversion from double to float can be a performance bottleneck on current commodity processors. The conversion is single instruction that can be retired once a cycle (on most systems):

https://lemire.me/blog/2022/07/20/how-quickly-can-you-convert-floats-to-doubles-and-back/

lemire avatar Jul 20 '22 17:07 lemire