dione
dione copied to clipboard
change to writing Avro B-tree blocks in pre-order
Summary
Following the discussion in #70, Looks like we are writing the blocks in the Avro B-tree in a "reverse post-order" order. this causes us to have many hops in the file when we want to iterate over it in sorted order, as the blocks are saved almost in reverse order. Instead, this PR changes the behavior to write the blocks in "pre-order".
following that we can continue with the caching PR (#70).
It seems that somehow we're backward compatible! (as no code changes were required in the "reader" part).
Detailed Description
Instead of writing directly to the inMemoryBuffer ("flush"), we leave all the data in the tree in memory, and only at the end we write to the inMemoryBuffer in "reverse pre-order", update the parent records with the "real" offset, and in the end, reverse and write to the file buffer to get a file with pre-order sorted blocks.
How was it tested?
added a unit test to explicitly verify the file's block order.
Wow... that was fast :) Great job!
Did it improve the iteration performance?
yeah.. it turned out to be relatively easy to implement. this one is the preliminary to the cache to work now #72 . and I tested #72 locally against s3 and the full sorted iteration is now comparable to regular file iteration. so it looks promising.
@eyala - any chance we could build and verify this "new" writing method on large scale?
Yes, I can arrange it. But we should test the cache branch, right?
not sure, currently the only place it is used (caching) is in joinWithIndex
. are you using it?
I mainly want to verify that we're not getting OOM during index file creation.