[CELEBORN-1919] Hardsplit batch tracking should be disabled when pushing only a single replica
What changes were proposed in this pull request?
Remove unused batch data tracking
Why are the changes needed?
When the optimization to handle skewed partition reads is enabled, Celeborn typically tracks all failed batches to avoid potential data duplication. However, Tracking of hardsplit batch can be safely disabled when pushing a single replica, as data never write to partition data file. as these batches would definitively not write to their previous partition locations. Therefore, Celeborn does not need to track these batches, as doing so could overload the Driver
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manual test & Pass GA
cc @wangshengjie123
In which scenarios will the Batch corresponding to the PushDataRequest with HARD_SPLIT response be appended to the PartitionLocation datafile? @RexXiong
In which scenarios will the Batch corresponding to the PushDataRequest with HARD_SPLIT response be appended to the PartitionLocation datafile? @RexXiong
Tracking of hardsplit batch can be safely disabled when pushing a single replica, as data never write to partition data file. but it would possible write to partition file when push replicate is enabled, as replicate partition may response hard split status while primary partition already write the data to the partition.
However, when a partition encounters a HardSplit, the affected batches need to be resubmitted, but these resubmitted batches would definitively not write to their previous partition locations. Therefore, Celeborn does not need to track these batches, as doing so could overload the Driver
Maybe the description needs to be modified. Even if batches with the same BatchID fall into different PartitionLocations, we still need the failed batch information to de-replicate them.
@FMX @wangshengjie123 Could you help to review this?