starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

[Feature] Add a block cache module to improve external table query performance

Open GavinMar opened this issue 2 years ago • 1 comments

What type of PR is this:

  • [ ] BugFix
  • [x] Feature
  • [ ] Enhancement
  • [ ] Refactor
  • [ ] UT
  • [ ] Doc
  • [ ] Tool

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

We implement a block cache module to cache the remote data from hdfs, s3, etc. To improve external table query performance, this PR mainly completes the following two points:

  1. Implement a block cache based on cachelib library.
  2. Wrap a CacheInputStream to adapt the block cache interface.

Perf Test:

  • dataset: SSB-100GB
  • cpu: 32vCPU
  • cache memory size: 20GB

The result as follows:

Query Disable Cache Enable Cache Disable Cache/Enable Cache
Q01 3440 1020 3.37
Q02 3463 951 3.64
Q03 3487 847 4.12
Q04 6787 1029 6.60
Q05 6669 878 7.60
Q06 6673 808 8.26
Q07 6183 1745 3.54
Q08 5811 931 6.24
Q09 5747 892 6.44
Q010 5550 1084 5.12
Q011 8675 1957 4.43
Q012 8626 1800 4.79
Q013 8605 1672 5.15
SUM 79716 15614 5.11

Checklist:

  • [x] I have added test cases for my bug fix or my new feature
  • [ ] I have added user document for my new feature or new function

GavinMar avatar Sep 23 '22 05:09 GavinMar

after this PR, we should introduce zero-copy semantic for cache.

DorianZheng avatar Sep 26 '22 03:09 DorianZheng

after this PR, we should introduce zero-copy semantic for cache.

OK, looks like it could become a performance bottleneck for us.

GavinMar avatar Sep 26 '22 03:09 GavinMar

run starrocks_admit_test

GavinMar avatar Sep 26 '22 06:09 GavinMar

[FE PR Coverage Check]

:heart_eyes: pass : 0 / 0 (0%)

wanpengfei-git avatar Sep 26 '22 16:09 wanpengfei-git

run starrocks_admit_test

wanpengfei-git avatar Sep 27 '22 06:09 wanpengfei-git