Update SelectDB and Apache Doris performance on metal machine
Thank You for Your Contribution!
We appreciate your effort and contribution to the project. To ensure that your Pull Request (PR) adheres to our guidelines, please ensure to review the rules mentioned in our contribution guidelines:
ClickHouse/ClickBench Contribution Rules
Thank you for your attention to these details and for helping us maintain the quality and integrity of the project.
Hi @chhetripradeep @alexey-milovidov , this is another PR to revise the former one. For those points you mentioned in #201, I will provide some information for your reference.
Query Result Caching
Query Result Caching is definitely disabled for each publish version of SelectDB/Apache Doris. As for your discovery of performance difference between cold run and hot run, it is because a built-in page cache which is very common for mainstream data warehouse. To prove that, I provided those two query profile in this reply
Wrong Query Response As you mentioned, 2 queries have wrong response in SelectDB/Apache Doris. However, after comparing with DuckDB 1.0, I found SelectDB/Apache Doris have the same Q29 results with DuckDB which is different from ClickHouse . Details to reproduce this result is also presented here.
For Q14 's problem, I checked SelectDB/Apache Doris and found the version I used has a critical bug which leads to this problem. Notwithstanding, I also found this bug has almost no impact on performance. After fixing it, I have already update the binary and re-submit the performance in this PR.
Specialized Build
SelectDB is a Modern Real-Time Data Warehouse which is powered by open-source Apache Doris. It is a commercial product which used for lightning-fast analytics on real-time data at scale so we do not want to make our commercial product open-source. However, the binary https://doris-clickbench.oss-ap-southeast-1.aliyuncs.com/selectdb-linux_x64.tar.gz we used is based on a recent developing branch which is already verified by our internal CI/CD. So users could choose to use it as they want.
Hi @Gabriel39
Thank you for raising the PR again and giving information about my questions from https://github.com/ClickHouse/ClickBench/pull/201 I will have few followup questions.
As for your discovery of performance difference between cold run and hot run, it is because a built-in page cache
I would like to understand the built-in page cache better, it would be very nice if you can share any documentation link, code implementation link related to this built-in page cache. Since SelectDB is commerical, it will be nice if you can share the links for the open-source counterpart i.e. Apache Doris.
According to ClickBench rules, the first run should be
cold runand second & third runs should be thehot run.
In order to fulfill this requirement, currently we encourage all databases to drop linux filesystem cache. But it is not necessary that databases keep their cached data in linux filesystem cache. In those cases, dropping linux filesystem cache has no impact on actual caches maintained and all queries runs like hot runs. We are noticing similar behaviour in case of SelectDB & Apache Doris too. If SelectDB have any other command to drop this built-in page cache before first run of every query, I would recommend to add it to fulfill this requirement.
I have one followup question to you. I noticed that the cold run of Q24 is running quite fast in SelectDB. On looking into it, I saw that SelectDB reads lot less data from disk for this query as compared to other databases. I would be interested to know more about it.
Here is the list of databases timings of Q24 on c6a.metal
| Database | Query Time (in seconds) |
|---|---|
| chdb | 55.87 |
| clickhouse | 10.75 |
| databend | 10.06 |
| doris | 0.41 |
| duckdb | 106.65 |
| glaredb | 59.37 |
| questdb-partitioned | 8.02 |
| selectdb | 0.36 |
| starrocks | 28.25 |
| umbra | 127.19 |
For cold run of Q24, amount of data read from disk by
- SelectDB : https://pastila.nl/?012cb355/6398e8b3b6e55907eadc74ab1eaa1cb2#hZXk19Asf0sTN6e/EOGqag==
- ClickHouse: https://pastila.nl/?00d723b5/b8879a2bad507787fedec626a2d4fda5#AYAYHJWIPe5mAlJ8O9igRA==
I have seen similar behaviour for other queries too.
Hi @chhetripradeep
Here are my replies to your questions
it would be very nice if you can share any documentation link
Unfortunately, there is no specific document to present the details. But I think it is uncomplicated. Like the most system adopted, SelectDB/Apache Doris uses a basic LRU cache as data cache. When a page is first accessed by users, it will be decompressed and cached wholly in data cache (if no space left, use a LRU strategy to evict cache). So when this page is accessed later, it will be more efficient because we do not need to access disks and decompress data files.
If SelectDB have any other command to drop this built-in page cache before first run of every query, I would recommend to add it to fulfill this requirement.
Now page cache is just enabled and disabled in config files which is set before clusters start up. Besides, when I refer to other systems, I found most of those have the same cache strategy as SelectDB/Doris do. So I think if you want to make the scores not affected by built-in page cache, you should disable all caches and test all systems from scratch. It is not fair to only request SelectDB to do it.
Speed of Q24
As you see, SelectDB/Apache Doris is faster is because less data is accessed. To do it, SelectDB/Apache Doris use a heap sort algorithm to execute the ORDER BY ... LIMIT ... in Q24 and the top of heap can be used as a predicate and push down to the scan layer.
@chhetripradeep I have already ran another result without page cache in SelectDB. PTAL.
So I think if you want to make the scores not affected by built-in page cache, you should disable all caches and test all systems from scratch. It is not fair to only request SelectDB to do it.
It is a requirement that every system should follow to be added. If you see any issues with other systems please show them as we normally verify this step is done and first query run is not using page cache.
It is okay if the system performs caching for source data (buffer pools and similar). If the cache or buffer pools can be flushed, they should be flushed before the first run of every query.
Hi @Gabriel39
Thank you for accommodating our request for dropping the page cache before first run of every query. Everything looks good. It will be very nice if we can include /api/clear_data_cache for Apache Doris benchmarks too.
Hi @Gabriel39
Thank you for accommodating our request for dropping the page cache before first run of every query. Everything looks good. It will be very nice if we can include
/api/clear_data_cacheforApache Dorisbenchmarks too.
Hi @chhetripradeep I will do this in next release version of Apache Doris and then update its results. So I just want to update SelectDB scores in this PR.
@chhetripradeep PTAL
Sorry for the delay. Looks good.
This one not addressed. https://github.com/ClickHouse/ClickBench/pull/203#pullrequestreview-2141357701