spark-acid
spark-acid copied to clipboard
Issue-70 Fix the repartitioning logic to handle statement IDs
For UPDATE/DELETE, we were repartitioning based on encoded bucketIds so that all rows with same bucket are processed by the same task. However, rows can have same bucket but different encoded bucketIds as encoded bucketIds are composed of both bucket+statementId. Hence, row with same bucket end up going to different tasks which can cause conflict as different task will be writing to the same delete delta bucket file. CPed from SPAR-4637