dione icon indicating copy to clipboard operation
dione copied to clipboard

change to writing Avro B-tree blocks in pre-order

Open uzadude opened this issue 2 years ago • 5 comments

Summary

Following the discussion in #70, Looks like we are writing the blocks in the Avro B-tree in a "reverse post-order" order. this causes us to have many hops in the file when we want to iterate over it in sorted order, as the blocks are saved almost in reverse order. Instead, this PR changes the behavior to write the blocks in "pre-order".

following that we can continue with the caching PR (#70).

It seems that somehow we're backward compatible! (as no code changes were required in the "reader" part).

Detailed Description

Instead of writing directly to the inMemoryBuffer ("flush"), we leave all the data in the tree in memory, and only at the end we write to the inMemoryBuffer in "reverse pre-order", update the parent records with the "real" offset, and in the end, reverse and write to the file buffer to get a file with pre-order sorted blocks.

How was it tested?

added a unit test to explicitly verify the file's block order.

uzadude avatar Jan 02 '23 13:01 uzadude

Wow... that was fast :) Great job!

shay1bz avatar Jan 03 '23 05:01 shay1bz

Did it improve the iteration performance?

shay1bz avatar Jan 03 '23 05:01 shay1bz

yeah.. it turned out to be relatively easy to implement. this one is the preliminary to the cache to work now #72 . and I tested #72 locally against s3 and the full sorted iteration is now comparable to regular file iteration. so it looks promising.

@eyala - any chance we could build and verify this "new" writing method on large scale?

uzadude avatar Jan 03 '23 13:01 uzadude

Yes, I can arrange it. But we should test the cache branch, right?

eyala avatar Jan 03 '23 14:01 eyala

not sure, currently the only place it is used (caching) is in joinWithIndex. are you using it? I mainly want to verify that we're not getting OOM during index file creation.

uzadude avatar Jan 03 '23 15:01 uzadude