quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

source: preserve Position arg in ObjectUriBatchReader

Open noam-sol opened this issue 5 months ago • 4 comments

Description

Before the fix, we lose the granularity between Beginning and offset 0 which affects the output checkpoint_delta.

How was this PR tested?

Added a UT

noam-sol avatar Sep 28 '25 13:09 noam-sol

This is not the right fix. It's a bug in the checkpoint delta logic in the source. If you print the batch checkpoint deltas in that unit test, you will see something like:

for batch in batches {
            println!("batch checkpoint delta: {:?}", batch.checkpoint_delta);
        }
batch checkpoint delta: ∆(file:///var/folders/cj/3nfybfc572b9s2qvd18qzh_m0000gp/T/.tmpDz99t1:(00000000000000000000..~00000000000000000350])
batch checkpoint delta: ∆(file:///var/folders/cj/3nfybfc572b9s2qvd18qzh_m0000gp/T/.tmp6o6nbz:(00000000000000000070..~00000000000000000350])

The first checkpoint delta should be (beginning..~00000000000000000350].

Do you want to take a stab at it?

guilload avatar Sep 29 '25 09:09 guilload

ahh I see, on this line - https://github.com/quickwit-oss/quickwit/blob/a58b6de9542db62455888dfc9de65da901c64831/quickwit/quickwit-indexing/src/source/doc_file_reader.rs#L161

we lose the granularity between Beginning and offset 0; however I don't feel like properly fixing it is worth the risk of breaking something

noam-sol avatar Sep 29 '25 15:09 noam-sol

This is the right fix and we're not going to break anything. I will not merge this PR as is.

guilload avatar Sep 29 '25 15:09 guilload

@guilload I updated the branch to fix the issue as you suggested, and added a UT as well. Thanks for the feedback! Do you have any other ideas on how to test this?

noam-sol avatar Sep 30 '25 15:09 noam-sol