OpenTimelineIO icon indicating copy to clipboard operation
OpenTimelineIO copied to clipboard

RationalTime value mismatch when serialize/deserialize to JSON

Open rdelillo opened this issue 1 year ago • 7 comments

Required:


  • [X] I believe this isn't a duplicate topic
  • [X] This report is not related to an adapter

For adapter related issues, please go to the appropriate repository - likely in the OpenTimelineIO github organization. For general questions and help please use the Academy Software Foundation slack, #opentimelineio.

Select One:

  • [ ] Build problem
  • [X] Incorrect Functionality or bug
  • [ ] New feature or functionality

Description

I'm having RationalTime accuracy issue when serializing/deserializing a Clip as json. It's very small but enough to produce value comparison mismatch on my end.

Is that expected ? Has anyone spotted that before ? Is there something else I do not understand ?

(tested with opentimelineio-0.17.0 )

Reproduction Steps


import opentimelineio as otio

rate_23_976 = 23.976024627685547

# Create a random clip
start =  otio.opentime.RationalTime(30, 24.0)
src_range = otio.opentime.TimeRange(
  start.rescaled_to(rate_23_976),
  otio.opentime.RationalTime(0, rate_23_976)
)
clip = otio.schema.Clip(name="test", source_range=src_range)

# Recreate same clip through serialization/deserialization to JSON
clip2 = otio.schema.Clip.from_json_string(clip.to_json_string())

# Comparison
assert clip2.source_range.start_time == clip.source_range.start_time  # I'm expecting this to be True but raises
assert clip.source_range.start_time.rate == clip2.source_range.start_time.rate  # OK
assert clip.source_range.start_time.value == clip2.source_range.start_time.value # raises

rdelillo avatar Jan 15 '25 12:01 rdelillo

Notes on rapidjson for whoever takes this issue on; from the rapidjson documentation it seems round trip precision is not guaranteed given the "maximum 3 [UL{] error note. Have we looked at round tripping previously, or do we even already have unit tests for it?

## Parsing to Double {#ParsingDouble}

Parsing string into `double` is difficult. The standard library function `strtod()` can do the job but it is slow. By default, the parsers use normal precision setting. This has has maximum 3 [ULP](http://en.wikipedia.org/wiki/Unit_in_the_last_place) error and implemented in `internal::StrtodNormalPrecision()`.

When using `kParseFullPrecisionFlag`, the parsers calls `internal::StrtodFullPrecision()` instead, and this function actually implemented 3 versions of conversion methods.
1. [Fast-Path](http://www.exploringbinary.com/fast-path-decimal-to-floating-point-conversion/).
2. Custom DIY-FP implementation as in [double-conversion](https://github.com/floitsch/double-conversion).
3. Big Integer Method as in (Clinger, William D. How to read floating point numbers accurately. Vol. 25. No. 6. ACM, 1990).

If the first conversion methods fail, it will try the second, and so on.
## Double-to-String conversion {#dtoa}

Originally RapidJSON uses `snprintf(..., ..., "%g")`  to achieve double-to-string conversion. This is not accurate as the default precision is 6. Later we also find that this is slow and there is an alternative.

Google's V8 [double-conversion](https://github.com/floitsch/double-conversion
) implemented a newer, fast algorithm called Grisu3 (Loitsch, Florian. "Printing floating-point numbers quickly and accurately with integers." ACM Sigplan Notices 45.6 (2010): 233-243.).

However, since it is not header-only so that we implemented a header-only version of Grisu2. This algorithm guarantees that the result is always accurate. And in most of cases it produces the shortest (optimal) string representation.

The header-only conversion function has been evaluated in [dtoa-benchmark](https://github.com/miloyip/dtoa-benchmark).

meshula avatar Jan 16 '25 06:01 meshula

There are round trip tests, however they are testing integer value/rate combinations: https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/d6fff83a769797fe6b491d16193808b4ae585f6f/tests/test_serializable_object.py#L30

When I serialize the example clip:

{
    "OTIO_SCHEMA": "RationalTime.1",
    "rate": 23.976024627685547,
    "value": 29.970030784606934
}

If I print clip and clip2 value, I see the difference is on the last digit:

29.970030784606937
29.970030784606934

Interestingly, at least on my mac, if I change the pasted example to use a computed value:

rate_23_976 = 24 * 1000/1001

Then the assertions pass.

ssteinbach avatar Mar 31 '25 22:03 ssteinbach

Looks like RapidJSON is giving us a fixed number of decimal digits of precision (15) in the serialization. That means precision falls off for each power of 10 away from the origin.

Long term, two reasonable ways to make the precision of the floating point numbers predictable in serialization would be to consider switching OTIO from primarily an ASCII to a binary first format with an ASCII representation (like USD) or to serialize floats as strings.

The problem with serializing floats as strings is that we're using the JSON type conventions to imply type in metadata fields. The binary format is probably a better option. That is more robust to alternate parsers... even if we find a JSON library that has better handling of floating point to replace RapidJSON there could be freestanding OTIO readers that don't have the same rigor.

ssteinbach avatar Apr 01 '25 07:04 ssteinbach

Would we want to consider epsilon comparisons in the equality operator of RationalTime?

rogernelson avatar Apr 09 '25 15:04 rogernelson

There is a method for that:

https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/d87606277f415bb66e1877a8caf6c10bbeb381e4/src/opentime/rationalTime.h#L101

Which is also exposed in python:

https://github.com/AcademySoftwareFoundation/OpenTimelineIO/blob/d87606277f415bb66e1877a8caf6c10bbeb381e4/src/py-opentimelineio/opentime-bindings/opentime_rationalTime.cpp#L86

I don't think the default == should be an approximate equal though.

ssteinbach avatar Apr 09 '25 16:04 ssteinbach

That function relies on the user supplying their own epislon though, which I think can be hard for a lot of users to calculate.

Since a reasonable epsilon value is meant to be context-dependent, we could perhaps come up with a way of helping them by choosing a standard value similarly to how c++ handles this for double precision with std::numeric_limits.

If we consider a tick to be 1/rate, then something within, say 1/1000th of a tick (ie 1/rate/1000) could be effectively equal. If we want standard value across all rates, the perhaps rate should be set to the max reasonable rate (120 I would guess?).

We could do something mimic c++ numerc_limits and have that as the default value for almost_equal instead of 0, even if we want to keep == to mean exactly equal.

rogernelson avatar Apr 09 '25 17:04 rogernelson

haha, Roger, beware what you ask for! Stephan and I have been researching the question. https://github.com/ssteinbach/ordinate_precision_research

meshula avatar Apr 10 '25 04:04 meshula