iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Core: Fix Race Condition in ScanTaskIterable

Open singhpk234 opened this issue 3 weeks ago • 1 comments

About the changes

Solves https://github.com/apache/iceberg/issues/14823

On thinking this more in depth and thanks @amogh-jahagirdar for adding and pointing the log line, it was very crucial, a corner case crossed my mind i.e what if we checked the queue empty but by the time i could check the active worker > 0 the worker produced to queue and exited ? This especially is highlighted by the fact that we have just 1 record and more than > 1 workers spawned.

What i did is the following :

  1. check for active worker always first only if its = 0 then proceed and check taskQueue empty as last thing.
  2. check for isDone() when polling from TaskQueue is returns false.

SideNote: I will harden the coverage for this iterator, working on an exhaustive suite for the same.


Prev discussion : I tried running the test 100 times (all passes) ./gradlew :iceberg-core:test --tests "org.apache.iceberg.rest.TestRESTScanPlanning.scanPlanningWithBatchScan"

The only repoducer i got was running the whole suite, which @gaborkaszab shared (Thanks a ton for this)

for i in {1..20}; do
./gradlew :iceberg-core:test --tests org.apache.iceberg.rest.TestRESTScanPlanning --rerun
done

post patching this way it works for me repetedly

singhpk234 avatar Dec 11 '25 19:12 singhpk234

wow the flakyness still exists, still checking.

for i in {1..20}; do
./gradlew :iceberg-core:test --tests org.apache.iceberg.rest.TestRESTScanPlanning --rerun
done

passes with this patch

singhpk234 avatar Dec 11 '25 20:12 singhpk234

Thanks everyone for the review !

singhpk234 avatar Dec 16 '25 19:12 singhpk234