flatbuffers icon indicating copy to clipboard operation
flatbuffers copied to clipboard

[all languages][1.10][1.11] Flatc deseralizes float to different value

Open seifertm opened this issue 6 years ago • 16 comments

Consider the following schema Data.fbs and corresponding input data data-in.json:

table a {
    b:float;
}

root_type a;
{
  b: 0.5078125
}

If I serialize the JSON and deserialize it again, I would expect the number to be the same. However, the resulting float value becomes 0.507812:

$ flatc --binary --raw-binary Data.fbs data-in.json
$ cp data-in.bin data-out.bin
$ flatc --json --raw-binary --defaults-json Data.fbs -- data-out.bin
$ cat data-out.json
{
  b: 0.507812
}

An online floating point converter the binary representations seem to be off 1 bit: 0.5078125: 00111111000000100000000000000000 0.507812 : 00111111000000011111111111111000

The issue is reproducible with flatbuffers v1.10.0 and v1.11.0.

Shouldn't float32 precision be sufficient to do one serialization round trip without rounding errors?

seifertm avatar May 25 '19 09:05 seifertm

The conversion from binary float to decimals certainly isn't lossless, the question is whether the printer and parser can be made to do the exact same rounding for these?

It can also be that the printer typically drops the last digit since it is typically "garbage", i.e. forces 1.0 or whatever to have a non-0 digit at the end.

@vglavnyy may know better.

aardappel avatar May 30 '19 23:05 aardappel

The current implementation of JSON serializer (bin to json) use fixed (6 and 12) precision specifier. https://github.com/google/flatbuffers/blob/3a88e1031be9ad97459c4de4f77319e19cbb648d/src/idl_gen_text.cpp#L63-L68

https://github.com/google/flatbuffers/blob/3a88e1031be9ad97459c4de4f77319e19cbb648d/include/flatbuffers/util.h#L179-L185

You can change these constants.

Probably, float_format attribute orJSON-option for GenerateText method will be a good solution. Previous discussion: #4589.

Performance: I hope the {fmt} library will be accepted into C++20 (P0645).

vglavnyy avatar May 31 '19 16:05 vglavnyy

Thank you! This does explain what's going on in my tests. From my side, this issue can be closed. It would be nice, if this could be configured at runtime, though.

I created a PR to document this behavior, in case anyone else runs into this.

seifertm avatar Jun 07 '19 06:06 seifertm

This issue has been automatically marked as stale because it has not had activity for 1 year. It will be automatically closed if no further activity occurs. To keep it open, simply post a new comment. Maintainers will re-open on new activity. Thank you for your contributions.

stale[bot] avatar Jun 06 '20 07:06 stale[bot]

Is there any resolution to this issue planned? The current behavior maps 40% of the dynamic range of a float (10^-6 to 10^-38) to one single value, 0.0. For doubles it's almost 50% of the range that gets incorrectly mapped.

I don't think "default incorrect behavior with an option to make it correct" would be a good approach, just to be backwards compatible.

threebrooks avatar Aug 08 '20 04:08 threebrooks

Reopening to find a better long time solution.

dbaileychess avatar Jan 16 '21 01:01 dbaileychess

@vglavnyy We would like to look at this again, specifically for json. Is your last comment on this issue (https://github.com/google/flatbuffers/issues/5371#issuecomment-497772425) still the correct path to take?

dbaileychess avatar Jan 16 '21 01:01 dbaileychess

There is the only way - use hexadecimal literals as input (already supported) and output format. I use hexadecimal format when export JSON from CAD plugin then read this JSON using flatbuffers parser without any rounding errors. I know about "Correctly Rounded Binary-Decimal and Decimal-Binary Conversions" by David M. Gay and know about the Ryu algorithm. Also, the C++17 standard introduced to_chars/from_chars methods with round-trip safety guarantee. But the only MSVC compiler has support of to_chars<float/double>(). and I'm not sure about its round-trip safety. I've tested MSVC to_chars based on the Ryu algorithm, it works. But GCC and Clang don't have it, so it is impossible to use it inside flatbuffers.

With decimal presentation there a lot of problems with denormalized values (except 0). Any precision control options would not help. For example:

  • denormalized 0x0.000002p-126 - this single precision value requires 47 decimal digits for printing (~ 1.40129846432e-45).
  • normal 0x1.000002p0 == 1.00000011920928955078125 (23 digits). These two values are valid IEEE-754 floating-point values. This is output of to_chars with fixed format and default precision:
  • 0x0.000002p-126 -> 1e-45
  • 0x1.000002p0 -> 1.0000001192092896 This is very good (perfect) result, but not as precise as the original hexadecimal literals.

Summary:

  • if someone needs exact float-to-text conversion the hexadecimal format is the only way,
  • for generic float-to-text conversion better to use to_chars or use the Ryu implementation.

vglavnyy avatar Jan 16 '21 11:01 vglavnyy

Just a quick note from my side, for the purposes of Tensorflow model conversion, a simple switch to std::scientific in FloatToString() did the trick and fixed all my problems while staying readable in the JSON format.

threebrooks avatar Jan 16 '21 14:01 threebrooks

a simple switch to std::scientific in FloatToString() did the trick and fixed all my problems

As a temporary workaround I propose to add --float_format option with a precision field:

  • flatc --float-format hex
  • flatc --float-format fixed<:ulp-precision>
  • flatc --float-format science<:ulp-precision>
  • flatc --float-format general<:ulp-precision>

Here the optional argument <:ulp-precision> is a kind of precision. This is coarseness or reciprocal precision in terms of decimal ULPs. Then greater is ulp-precision parameter the shorter the fractional part will be. This is the example of ulp-based NumToString.

In this example precision of the fractional part is controlled by binary-ULP or ULP. This is the output:

0.125f, 0.125, 1 0.50781f, 0.50781, 1 0.507812f, 0.507812, 1 0.5078125f, 0.5078125, 1 5.078125f, 5.078125, 1 507.8125f, 507.8125, 1 2.718f, 2.7179999, 1 3.14159f, 3.1415901, 1 3.1415926f, 3.1415925, 1 .314159f, 0.31415901, 1 .00314159f, 0.0031415899, 1 .0000314159f, 0.000031415901, 1 .00000000000000314159f, 0.0000000000000031415899, 1

This approach (workaround) can be easily implemented in flatbuffers. This must work with scientific notation as well. But I'm sure that using std::to_chars or the Ryu will be the best solution.

vglavnyy avatar Jan 16 '21 17:01 vglavnyy

As a temporary workaround I propose to add --float_format option with a precision field: flatc --float-format hex flatc --float-format fixed<:ulp-precision> flatc --float-format science<:ulp-precision> flatc --float-format general<:ulp-precision>

My reading of this is that serializing with something like

template<class T> void SerializeFloat(T value, char *buffer) {
   sprintf(buffer, "%.*g", std::numeric_limits<T>::max_digits10, value);
}

(or an equivalent implementation using streams and precision/format settings) is guaranteed to be round-trip invariant.

Maybe there could be another flag value, e.g. flatc --float-format safe that makes that guarantee?

akabel avatar Jan 16 '21 19:01 akabel

Maybe there could be another flag value, e.g. flatc --float-format safe that makes that guarantee?

I think this mode might be a default mode. There are two problems:

  1. printed value is an exact presentation of a binary IEEE-754 floating-point value.
  2. printed value is sufficient to restore a binary floating-point value (round-trip guarantee).

The first problem has a simple solution - the hexadecimal format. This is the shortest and exact format. The second could be solved with general format + std::numeric_limits<T>::max_digits10. This can be extended with the precision control option. But the flatbuffers library can't state a round-trip guarantee because it depends on the standard run-time library.

There is the third problem when a printed value doesn't match with an input value (JSON or field's default value in the scheme) if the input value doesn't have the exact IEEE-754 representation. For example: input json { e: 2.718 } will be printed as { e: 2.71799994 } or as { e: 1.5be76c000p+1 }. One can use the precision control option to adjust the output value (with additional rounding error).

To solve this issue need to integrate sprintf("%.*g") and optionally add --float_precision option.

vglavnyy avatar Jan 17 '21 05:01 vglavnyy

Yeah its a shame the compilers are moving so slow on std::to_chars, would be really nice to use it.

I'd be fine with some --float-precision options. Would also be nice to have a good summary of what @vglavnyy just explained in the docs we can point at.. because I am sure most people do not remember the intricacies of why these trade-offs exist..

aardappel avatar Jan 19 '21 20:01 aardappel

This issue is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.

github-actions[bot] avatar Sep 10 '21 20:09 github-actions[bot]

As a first step for users that need to preserve bits, I'd suggest a flag that outputs hex floats, since we already parse them. Maybe --float-format hex as suggested by @vglavnyy so that later the other options can be added.

aardappel avatar Dec 02 '21 17:12 aardappel

So JSON doesn't support hexadecimal numbers: https://www.json.org/json-en.html So I don't think printing them out as HEX is the correct way to handle this.

dbaileychess avatar Sep 13 '22 04:09 dbaileychess

This issue is stale because it has been open 6 months with no activity. Please comment or label not-stale, or this will be closed in 14 days.

github-actions[bot] avatar Mar 14 '23 20:03 github-actions[bot]

This issue was automatically closed due to no activity for 6 months plus the 14 day notice period.

github-actions[bot] avatar Mar 29 '23 20:03 github-actions[bot]