Ax icon indicating copy to clipboard operation
Ax copied to clipboard

Store floats exactly in Pandas DataFrames; require exact equality for Pandas DataFrames to be considered equal

Open esantorella opened this issue 2 years ago • 8 comments

Pandas to_json by default stores 10 decimal places of floats. This leads to precision loss. The fix is not completely trivial, because storing many more decimal places would not guarantee recoverability of very small floats. A better way to store floats exactly as strings is to use repr, which will store the float in as few digits as possible while maintaining that float(repr(value)) = value.

  • Use repr to encode floats as strings without precision loss when storing DataFrames as json
  • Store dtype info so that data can be recovered exactly. (Otherwise, there would be no way of knowing that "3.1234" should be a float but "3_1" is an int, sine 3.1234 and 3_1 are both valid numbers in Python).
  • Add DataFrame encoders and decoders
  • Require DataFrames to be exactly equal by default for equality checks to pass

A downside of this change is that it stores the format in which DataFrames are encoded, so now the DataFrame decoder must support both the old and new formats.

Test plan

  • Added a unit test that reproduces the behavior and ensures that DataFrames stored in both the old and new formats are properly decoded.
  • Added test cases for exact vs. approximate DataFrame equality.

esantorella avatar Jul 30 '23 14:07 esantorella

I encountered a case where an experiment had float64 trial data where the data changed values after being saved and reloaded. It looks like the data was converted to float32 and then back to float64

Is this something that happened on the Ax side? I've run into floating point precision issues in the past (https://github.com/facebook/Ax/pull/1538), ideally we could fix this at the root of the issue.

Balandat avatar Jul 30 '23 18:07 Balandat

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jul 31 '23 17:07 facebook-github-bot

Is this something that happened on the Ax side? I've run into floating point precision issues in the past (#1538), ideally we could fix this at the root of the issue.

Yeah, this had to do with pandas to_json. I'm not sure if this would have avoided the issue behind #1538 -- depends whether the numbers involved came from serializing a DataFrame.

esantorella avatar Jul 31 '23 17:07 esantorella

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jul 31 '23 21:07 facebook-github-bot

This pull request was exported from Phabricator. Differential Revision: D47922407

facebook-github-bot avatar Jul 31 '23 22:07 facebook-github-bot

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Jul 31 '23 22:07 facebook-github-bot

@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 01 '23 00:08 facebook-github-bot

Codecov Report

Patch coverage: 96.72% and project coverage change: -0.01% :warning:

Comparison is base (feab1f3) 94.38% compared to head (683f460) 94.38%.

:exclamation: Current head 683f460 differs from pull request most recent head 3194dd0. Consider uploading reports for the commit 3194dd0 to get more accurate results

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1750      +/-   ##
==========================================
- Coverage   94.38%   94.38%   -0.01%     
==========================================
  Files         454      454              
  Lines       42485    42515      +30     
==========================================
+ Hits        40099    40127      +28     
- Misses       2386     2388       +2     
Files Changed Coverage Δ
ax/storage/json_store/decoder.py 97.22% <ø> (-0.05%) :arrow_down:
ax/storage/json_store/encoder.py 98.11% <ø> (-0.07%) :arrow_down:
ax/storage/sqa_store/encoder.py 96.72% <ø> (ø)
ax/utils/common/testutils.py 80.34% <ø> (ø)
ax/storage/sqa_store/tests/test_sqa_store.py 98.78% <50.00%> (-0.22%) :arrow_down:
ax/core/data.py 95.26% <100.00%> (ø)
ax/storage/json_store/encoders.py 97.83% <100.00%> (+0.09%) :arrow_up:
ax/storage/json_store/registry.py 100.00% <100.00%> (ø)
ax/storage/json_store/tests/test_json_store.py 94.41% <100.00%> (+0.29%) :arrow_up:
ax/storage/sqa_store/decoder.py 93.97% <100.00%> (ø)
... and 5 more

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Aug 01 '23 00:08 codecov-commenter