Store floats exactly in Pandas DataFrames; require exact equality for Pandas DataFrames to be considered equal
Pandas to_json by default stores 10 decimal places of floats. This leads to precision loss. The fix is not completely trivial, because storing many more decimal places would not guarantee recoverability of very small floats. A better way to store floats exactly as strings is to use repr, which will store the float in as few digits as possible while maintaining that float(repr(value)) = value.
- Use
reprto encode floats as strings without precision loss when storing DataFrames as json - Store dtype info so that data can be recovered exactly. (Otherwise, there would be no way of knowing that "3.1234" should be a float but "3_1" is an int, sine 3.1234 and 3_1 are both valid numbers in Python).
- Add DataFrame encoders and decoders
- Require DataFrames to be exactly equal by default for equality checks to pass
A downside of this change is that it stores the format in which DataFrames are encoded, so now the DataFrame decoder must support both the old and new formats.
Test plan
- Added a unit test that reproduces the behavior and ensures that DataFrames stored in both the old and new formats are properly decoded.
- Added test cases for exact vs. approximate DataFrame equality.
I encountered a case where an experiment had float64 trial data where the data changed values after being saved and reloaded. It looks like the data was converted to float32 and then back to float64
Is this something that happened on the Ax side? I've run into floating point precision issues in the past (https://github.com/facebook/Ax/pull/1538), ideally we could fix this at the root of the issue.
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Is this something that happened on the Ax side? I've run into floating point precision issues in the past (#1538), ideally we could fix this at the root of the issue.
Yeah, this had to do with pandas to_json. I'm not sure if this would have avoided the issue behind #1538 -- depends whether the numbers involved came from serializing a DataFrame.
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
This pull request was exported from Phabricator. Differential Revision: D47922407
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@esantorella has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
Codecov Report
Patch coverage: 96.72% and project coverage change: -0.01% :warning:
Comparison is base (
feab1f3) 94.38% compared to head (683f460) 94.38%.
:exclamation: Current head 683f460 differs from pull request most recent head 3194dd0. Consider uploading reports for the commit 3194dd0 to get more accurate results
:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.
Additional details and impacted files
@@ Coverage Diff @@
## main #1750 +/- ##
==========================================
- Coverage 94.38% 94.38% -0.01%
==========================================
Files 454 454
Lines 42485 42515 +30
==========================================
+ Hits 40099 40127 +28
- Misses 2386 2388 +2
| Files Changed | Coverage Δ | |
|---|---|---|
| ax/storage/json_store/decoder.py | 97.22% <ø> (-0.05%) |
:arrow_down: |
| ax/storage/json_store/encoder.py | 98.11% <ø> (-0.07%) |
:arrow_down: |
| ax/storage/sqa_store/encoder.py | 96.72% <ø> (ø) |
|
| ax/utils/common/testutils.py | 80.34% <ø> (ø) |
|
| ax/storage/sqa_store/tests/test_sqa_store.py | 98.78% <50.00%> (-0.22%) |
:arrow_down: |
| ax/core/data.py | 95.26% <100.00%> (ø) |
|
| ax/storage/json_store/encoders.py | 97.83% <100.00%> (+0.09%) |
:arrow_up: |
| ax/storage/json_store/registry.py | 100.00% <100.00%> (ø) |
|
| ax/storage/json_store/tests/test_json_store.py | 94.41% <100.00%> (+0.29%) |
:arrow_up: |
| ax/storage/sqa_store/decoder.py | 93.97% <100.00%> (ø) |
|
| ... and 5 more |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.