starrocks
starrocks copied to clipboard
[Feature] Add a block cache module to improve external table query performance
What type of PR is this:
- [ ] BugFix
- [x] Feature
- [ ] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Which issues of this PR fixes :
Fixes #
Problem Summary(Required) :
We implement a block cache module to cache the remote data from hdfs, s3, etc. To improve external table query performance, this PR mainly completes the following two points:
- Implement a block cache based on cachelib library.
- Wrap a CacheInputStream to adapt the block cache interface.
Perf Test:
- dataset: SSB-100GB
- cpu: 32vCPU
- cache memory size: 20GB
The result as follows:
Query | Disable Cache | Enable Cache | Disable Cache/Enable Cache |
---|---|---|---|
Q01 | 3440 | 1020 | 3.37 |
Q02 | 3463 | 951 | 3.64 |
Q03 | 3487 | 847 | 4.12 |
Q04 | 6787 | 1029 | 6.60 |
Q05 | 6669 | 878 | 7.60 |
Q06 | 6673 | 808 | 8.26 |
Q07 | 6183 | 1745 | 3.54 |
Q08 | 5811 | 931 | 6.24 |
Q09 | 5747 | 892 | 6.44 |
Q010 | 5550 | 1084 | 5.12 |
Q011 | 8675 | 1957 | 4.43 |
Q012 | 8626 | 1800 | 4.79 |
Q013 | 8605 | 1672 | 5.15 |
SUM | 79716 | 15614 | 5.11 |
Checklist:
- [x] I have added test cases for my bug fix or my new feature
- [ ] I have added user document for my new feature or new function
after this PR, we should introduce zero-copy semantic for cache.
after this PR, we should introduce zero-copy semantic for cache.
OK, looks like it could become a performance bottleneck for us.
run starrocks_admit_test
[FE PR Coverage Check]
:heart_eyes: pass : 0 / 0 (0%)
run starrocks_admit_test