carbondata icon indicating copy to clipboard operation
carbondata copied to clipboard

[CARBONDATA-4346] Remove list files while query and invalid cache

Open ShreelekhyaG opened this issue 1 year ago • 27 comments

Why is this PR needed?

  1. Performance degradation for Incremental updates is observed in the partition table.
  • During the update, in the prune step we are listing files from segment path to get the carbondata files and create fileNameToMetaInfoMapping map. On incremental update for partition table, the number of invalid files keep on increasing each time which is causing the degradation in createCarbonDataFileBlockMetaInfoMapping method.
  • Example: Assume a single partition with 1000 carbondata files. Perform 1st update: adds 900 new carbondata files. Perform 2nd update (same update query): adds another 900 carbondata files. Now the files added by 1st update are invalid. Perform query: It does list files. Here, considers invalid files also and adds to fileNameToMetaInfoMapping map. The number of invalid files keeps on increasing with each update which is causing the degradation in creating fileNameToMetaInfoMapping map.
  1. Invalid segments cache is not removed after delete/update.

What changes were proposed in this PR?

  1. Instead of listing files, made a change to get the carbon file from the file name and create BlockMetaInfo directly in createBlockMetaInfo. Impact when tested on a single partition with 100 segments: - There is significant improvement observed in the Incremental update operation. - 95% improvement seen in 1st time select count(*) operation. Because in select count(*) flow it was listing files for each segment and the map was not reused. Impact when tested on a non-partition table with 100 segments: - Almost the same or no improvement for the non-partition table

  2. Clearing invalid/deleted segments from cache after delete and update.

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • Yes

ShreelekhyaG avatar Jul 04 '22 14:07 ShreelekhyaG

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/772/

CarbonDataQA2 avatar Jul 04 '22 14:07 CarbonDataQA2

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6386/

CarbonDataQA2 avatar Jul 04 '22 14:07 CarbonDataQA2

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4641/

CarbonDataQA2 avatar Jul 04 '22 14:07 CarbonDataQA2

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6387/

CarbonDataQA2 avatar Jul 04 '22 17:07 CarbonDataQA2

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4642/

CarbonDataQA2 avatar Jul 04 '22 17:07 CarbonDataQA2

Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/773/

CarbonDataQA2 avatar Jul 04 '22 18:07 CarbonDataQA2

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6390/

CarbonDataQA2 avatar Jul 08 '22 12:07 CarbonDataQA2

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4645/

CarbonDataQA2 avatar Jul 08 '22 13:07 CarbonDataQA2

Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/776/

CarbonDataQA2 avatar Jul 08 '22 13:07 CarbonDataQA2

LGTM

Indhumathi27 avatar Jul 11 '22 11:07 Indhumathi27

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6393/

CarbonDataQA2 avatar Jul 12 '22 15:07 CarbonDataQA2

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4650/

CarbonDataQA2 avatar Jul 12 '22 16:07 CarbonDataQA2

Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/779/

CarbonDataQA2 avatar Jul 12 '22 16:07 CarbonDataQA2

retest this please

ShreelekhyaG avatar Jul 13 '22 04:07 ShreelekhyaG

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6395/

CarbonDataQA2 avatar Jul 13 '22 07:07 CarbonDataQA2

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4652/

CarbonDataQA2 avatar Jul 13 '22 07:07 CarbonDataQA2

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/781/

CarbonDataQA2 avatar Jul 13 '22 08:07 CarbonDataQA2

retest this please

ShreelekhyaG avatar Jul 13 '22 09:07 ShreelekhyaG

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4653/

CarbonDataQA2 avatar Jul 13 '22 09:07 CarbonDataQA2

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/782/

CarbonDataQA2 avatar Jul 13 '22 09:07 CarbonDataQA2

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6396/

CarbonDataQA2 avatar Jul 13 '22 12:07 CarbonDataQA2

LGTM

akashrn5 avatar Jul 14 '22 03:07 akashrn5

retest this please

ShreelekhyaG avatar Jul 14 '22 05:07 ShreelekhyaG

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6397/

CarbonDataQA2 avatar Jul 14 '22 07:07 CarbonDataQA2

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4654/

CarbonDataQA2 avatar Jul 14 '22 07:07 CarbonDataQA2

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/783/

CarbonDataQA2 avatar Jul 14 '22 08:07 CarbonDataQA2

Build Failed with Spark 2.4.5, Please check CI http://159.138.8.58:12602/job/ApacheCarbon_PR_Builder_2.4.5/4659/

shenjiayu17 avatar Oct 09 '22 15:10 shenjiayu17