delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

CI integration test fails randomly: test_restore_by_datetime

Open ion-elgreco opened this issue 2 years ago • 1 comments

One of our tests (test_restore_by_datetime) in the CI - integration tests sometimes randomly fails: https://github.com/delta-io/delta-rs/actions/runs/7044053111/job/19170981661?pr=1924

ion-elgreco avatar Nov 30 '23 09:11 ion-elgreco

There are 3 writes to the table in that test, could all 3 be within the same millisecond? The test is picking up v1 timestamp then retrieving a version for it. The error can be thrown if the snapshot version also has the same timestamp

r3stl355 avatar Nov 30 '23 13:11 r3stl355

Interesting results after some extra tracing within this test

All the history timestamps are [1703869558237, 1703869559278, 1703869560403, 1703869561411] showing that they are in sequence with 1 sec gap between them so I doubt the issue can be fixed by adding a gap between writes so I am going to remove the current delay between writes from the test

Test is looking for a timestamp 1703869559278, which is a correct value at index 1. I confirmed that is gets converted to a correct DateTime (converting it to timestamps gives a correct value: 1703869559278)

So looks like there is something wrong with the lookup logic

r3stl355 avatar Dec 29 '23 17:12 r3stl355

Problem located - this is not a flaky team IMO but a bug.

It starts here with creating a new DeltaTable with no history/timestamps loaded: https://github.com/delta-io/delta-rs/blob/02e26e5ecd7a8a64609bd21e8cb30e9bad4cc17b/crates/deltalake-core/src/operations/restore.rs#L142C44-L142C44

The consequences surface here: https://github.com/delta-io/delta-rs/blob/02e26e5ecd7a8a64609bd21e8cb30e9bad4cc17b/crates/deltalake-core/src/table/mod.rs#L884C28-L884C49 in turn leading to https://github.com/delta-io/delta-rs/blob/02e26e5ecd7a8a64609bd21e8cb30e9bad4cc17b/crates/deltalake-core/src/table/mod.rs#L643

Consequences: If your writes are fast enough to millisecond between the start of write and commit to storage - you are good, otherwise not and the restore to DataTime will not works as expected, possibly resulting in an earlier version being restored.

I'll submit a fix in the next days

r3stl355 avatar Dec 29 '23 18:12 r3stl355

I believe @r3stl355 fixed this

rtyler avatar Jan 03 '24 03:01 rtyler