fast_float icon indicating copy to clipboard operation
fast_float copied to clipboard

optimized (smaller) lookup table for float (binary32 only)

Open jrahlf opened this issue 4 years ago • 6 comments
trafficstars

Have you considered optimizing the code size for parsing floats? The LUT power_of_five_128 has approximately ~1400 entries which are needed for parsing doubles. I don't know how many entries are required for parsing a float, but I suspect the LUT could be a lot smaller in that case.

If there was a separate LUT for parsing floats, the compiled binary size could be reduced significantly.

jrahlf avatar Aug 29 '21 14:08 jrahlf

Pull requests invited!

lemire avatar Aug 29 '21 14:08 lemire

To be clear here, if I understand correctly, @jrahlf wants an implementation that supports only binary32 numbers (float). Squeezing the table is easy, one can simply follow through the paper at https://arxiv.org/abs/2101.11408

Of course, the net result will only support binary32 numbers.

lemire avatar Aug 30 '21 17:08 lemire

My mistake then, and I just confirmed that this would have off-by-1 values, which would mess up the logic.

Alexhuszagh avatar Aug 30 '21 17:08 Alexhuszagh

https://github.com/fastfloat/fast_float/blob/8c4405e76e8bdac4246eb9973e75bdc7962c8dd5/include/fast_float/fast_table.h#L34 If you change these to float, the table size shrinks from 1302 to 208, i.e. you can save approximately 8kB. So one could add another table power_of_five_128 for float and then let the templatized code use the correct table.

There is one catch: If you used both double and float, the code size would be greater (worse) than when only providing the double table. Two possible solutions: a) compile time option when the user only wants to parse floats, then from_chars<double> is disabled b) clever data packing so that only the float part of the table gets compiled into the binary, if only from_chars<float> is used. I am assuming here that the float table is a sub range of the double table. Is this correct, @lemire ?

jrahlf avatar Sep 05 '21 16:09 jrahlf

I am assuming here that the float table is a sub range of the double table.

Yes, it is.

lemire avatar Sep 05 '21 20:09 lemire

So I got a proof of concept: #103 I added the files: example_test_float.cpp and example_test_mixed.cpp. With the HEAD version, the file sizes are as follows (Ubuntu gcc9.3):

34072 Sep 12 14:43 tests/example_test
34072 Sep 12 14:43 tests/example_test_float
42656 Sep 12 14:43 tests/example_test_mixed  <-- not ideal

With the separate float LUT the sizes are:

34072 Sep 12 14:47 tests/example_test
25880 Sep 12 14:47 tests/example_test_float   <-- saves 8kB as expected
42744 Sep 12 14:47 tests/example_test_mixed

There are two notable things:

  • The extra float LUT only increases the mixed file size size by 100 Bytes, that is unexpected (in a good way). I expected an increase by 208 * 8Bytes = 1.6kB.
  • The mixed file size is 8k larger than the double file. Heavy inlining might not be ideal for the mixed case (regarding code size). E.g. readelf shows that fast_float::parse_long_mantissa has a code size of 4kB and is instantiated for both float and double.

I would prefer to to make the double LUT a composite of the float LUT and additional data, but reading a composite object as one linear array would violate C++ aliasing rules. :( However, this might be solvable with std::bit_cast ...

Overall it might makes sense to always use either double or float and not mix the types when parsing numbers.

jrahlf avatar Sep 12 '21 13:09 jrahlf