carbondata
carbondata copied to clipboard
[CARBONDATA-3976]CarbonData Update operation enhancement
Why is this PR needed?
Update operation will clean up delta files before update( see cleanUpDeltaFiles(carbonTable, false)), It's loop traversal metadata path and segment path many times. When there are too many files, the overhead will increase and update time will be longer.
What changes were proposed in this PR?
In cleanUpDeltaFiles have some same points in get files method, like updateStatusManager.getUpdateDeltaFilesList(segment, false,CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true, allSegmentFiles,true) and updateStatusManager.getUpdateDeltaFilesList(segment, false,CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true, allSegmentFiles,true), They are just different file types,but loop traversal segment path twice. we can merge it.
Does this PR introduce any user interface change?
- No
Is any new testcase added?
- No
Can one of the admins verify this patch?
add to whitelist
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2330/
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4069/
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2331/
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4070/
@nstang01 , Is this different than #3986 ?? If same may be you can close this PR.