Core: Fix Race Condition in ScanTaskIterable
About the changes
Solves https://github.com/apache/iceberg/issues/14823
On thinking this more in depth and thanks @amogh-jahagirdar for adding and pointing the log line, it was very crucial, a corner case crossed my mind i.e what if we checked the queue empty but by the time i could check the active worker > 0 the worker produced to queue and exited ? This especially is highlighted by the fact that we have just 1 record and more than > 1 workers spawned.
What i did is the following :
- check for active worker always first only if its = 0 then proceed and check taskQueue empty as last thing.
- check for isDone() when polling from TaskQueue is returns false.
SideNote: I will harden the coverage for this iterator, working on an exhaustive suite for the same.
Prev discussion : I tried running the test 100 times (all passes) ./gradlew :iceberg-core:test --tests "org.apache.iceberg.rest.TestRESTScanPlanning.scanPlanningWithBatchScan"
The only repoducer i got was running the whole suite, which @gaborkaszab shared (Thanks a ton for this)
for i in {1..20}; do
./gradlew :iceberg-core:test --tests org.apache.iceberg.rest.TestRESTScanPlanning --rerun
done
post patching this way it works for me repetedly
wow the flakyness still exists, still checking.
for i in {1..20}; do
./gradlew :iceberg-core:test --tests org.apache.iceberg.rest.TestRESTScanPlanning --rerun
done
passes with this patch
Thanks everyone for the review !