ClickBench icon indicating copy to clipboard operation
ClickBench copied to clipboard

Update SelectDB and Apache Doris performance on metal machine

Open Gabriel39 opened this issue 1 year ago • 6 comments

Thank You for Your Contribution!

We appreciate your effort and contribution to the project. To ensure that your Pull Request (PR) adheres to our guidelines, please ensure to review the rules mentioned in our contribution guidelines:

ClickHouse/ClickBench Contribution Rules

Thank you for your attention to these details and for helping us maintain the quality and integrity of the project.

Gabriel39 avatar Jun 18 '24 06:06 Gabriel39

Hi @chhetripradeep @alexey-milovidov , this is another PR to revise the former one. For those points you mentioned in #201, I will provide some information for your reference.

Query Result Caching Query Result Caching is definitely disabled for each publish version of SelectDB/Apache Doris. As for your discovery of performance difference between cold run and hot run, it is because a built-in page cache which is very common for mainstream data warehouse. To prove that, I provided those two query profile in this reply

Wrong Query Response As you mentioned, 2 queries have wrong response in SelectDB/Apache Doris. However, after comparing with DuckDB 1.0, I found SelectDB/Apache Doris have the same Q29 results with DuckDB which is different from ClickHouse . Details to reproduce this result is also presented here.

For Q14 's problem, I checked SelectDB/Apache Doris and found the version I used has a critical bug which leads to this problem. Notwithstanding, I also found this bug has almost no impact on performance. After fixing it, I have already update the binary and re-submit the performance in this PR.

Specialized Build SelectDB is a Modern Real-Time Data Warehouse which is powered by open-source Apache Doris. It is a commercial product which used for lightning-fast analytics on real-time data at scale so we do not want to make our commercial product open-source. However, the binary https://doris-clickbench.oss-ap-southeast-1.aliyuncs.com/selectdb-linux_x64.tar.gz we used is based on a recent developing branch which is already verified by our internal CI/CD. So users could choose to use it as they want.

Gabriel39 avatar Jun 18 '24 07:06 Gabriel39

Hi @Gabriel39

Thank you for raising the PR again and giving information about my questions from https://github.com/ClickHouse/ClickBench/pull/201 I will have few followup questions.

As for your discovery of performance difference between cold run and hot run, it is because a built-in page cache

I would like to understand the built-in page cache better, it would be very nice if you can share any documentation link, code implementation link related to this built-in page cache. Since SelectDB is commerical, it will be nice if you can share the links for the open-source counterpart i.e. Apache Doris.

According to ClickBench rules, the first run should be cold run and second & third runs should be the hot run.

In order to fulfill this requirement, currently we encourage all databases to drop linux filesystem cache. But it is not necessary that databases keep their cached data in linux filesystem cache. In those cases, dropping linux filesystem cache has no impact on actual caches maintained and all queries runs like hot runs. We are noticing similar behaviour in case of SelectDB & Apache Doris too. If SelectDB have any other command to drop this built-in page cache before first run of every query, I would recommend to add it to fulfill this requirement.

I have one followup question to you. I noticed that the cold run of Q24 is running quite fast in SelectDB. On looking into it, I saw that SelectDB reads lot less data from disk for this query as compared to other databases. I would be interested to know more about it.

Here is the list of databases timings of Q24 on c6a.metal

Database Query Time (in seconds)
chdb 55.87
clickhouse 10.75
databend 10.06
doris 0.41
duckdb 106.65
glaredb 59.37
questdb-partitioned 8.02
selectdb 0.36
starrocks 28.25
umbra 127.19

For cold run of Q24, amount of data read from disk by

  • SelectDB : https://pastila.nl/?012cb355/6398e8b3b6e55907eadc74ab1eaa1cb2#hZXk19Asf0sTN6e/EOGqag==
  • ClickHouse: https://pastila.nl/?00d723b5/b8879a2bad507787fedec626a2d4fda5#AYAYHJWIPe5mAlJ8O9igRA==

I have seen similar behaviour for other queries too.

chhetripradeep avatar Jun 19 '24 12:06 chhetripradeep

Hi @chhetripradeep

Here are my replies to your questions

it would be very nice if you can share any documentation link

Unfortunately, there is no specific document to present the details. But I think it is uncomplicated. Like the most system adopted, SelectDB/Apache Doris uses a basic LRU cache as data cache. When a page is first accessed by users, it will be decompressed and cached wholly in data cache (if no space left, use a LRU strategy to evict cache). So when this page is accessed later, it will be more efficient because we do not need to access disks and decompress data files.

If SelectDB have any other command to drop this built-in page cache before first run of every query, I would recommend to add it to fulfill this requirement.

Now page cache is just enabled and disabled in config files which is set before clusters start up. Besides, when I refer to other systems, I found most of those have the same cache strategy as SelectDB/Doris do. So I think if you want to make the scores not affected by built-in page cache, you should disable all caches and test all systems from scratch. It is not fair to only request SelectDB to do it.

Speed of Q24

As you see, SelectDB/Apache Doris is faster is because less data is accessed. To do it, SelectDB/Apache Doris use a heap sort algorithm to execute the ORDER BY ... LIMIT ... in Q24 and the top of heap can be used as a predicate and push down to the scan layer.

Gabriel39 avatar Jun 20 '24 07:06 Gabriel39

@chhetripradeep I have already ran another result without page cache in SelectDB. PTAL.

Gabriel39 avatar Jun 26 '24 09:06 Gabriel39

So I think if you want to make the scores not affected by built-in page cache, you should disable all caches and test all systems from scratch. It is not fair to only request SelectDB to do it.

It is a requirement that every system should follow to be added. If you see any issues with other systems please show them as we normally verify this step is done and first query run is not using page cache.

It is okay if the system performs caching for source data (buffer pools and similar). If the cache or buffer pools can be flushed, they should be flushed before the first run of every query.

qoega avatar Jun 26 '24 10:06 qoega

Hi @Gabriel39

Thank you for accommodating our request for dropping the page cache before first run of every query. Everything looks good. It will be very nice if we can include /api/clear_data_cache for Apache Doris benchmarks too.

chhetripradeep avatar Jul 02 '24 08:07 chhetripradeep

Hi @Gabriel39

Thank you for accommodating our request for dropping the page cache before first run of every query. Everything looks good. It will be very nice if we can include /api/clear_data_cache for Apache Doris benchmarks too.

Hi @chhetripradeep I will do this in next release version of Apache Doris and then update its results. So I just want to update SelectDB scores in this PR.

Gabriel39 avatar Jul 17 '24 03:07 Gabriel39

@chhetripradeep PTAL

Gabriel39 avatar Jul 19 '24 14:07 Gabriel39

Sorry for the delay. Looks good.

chhetripradeep avatar Aug 01 '24 08:08 chhetripradeep

This one not addressed. https://github.com/ClickHouse/ClickBench/pull/203#pullrequestreview-2141357701

qoega avatar Aug 13 '24 11:08 qoega