Support Output MySQL format
Summary
Execution time of query select * from customer limit 1000000(customer is generated by tpc-h dbgen with 100GB scale factor) is 10-sec level which is less than 1 sec in mysql.
ps: test in my own macbook with a cluster of 3 query nodes
pprof:

Could you try one databend-query node and test it again? It is not useful to deploy 3-nodes on a single machine(IO&CPU-bound) If you are using local fs, there are some performance issue.
Yes, I used local fs as the engine. I before tested with 1 query node, just a little better. I'll test it again some time later.
Thank you. Databend is not designed and improved for local FS, we only do it for test. There are some issues(eg. https://github.com/tokio-rs/tokio/issues/3664) to slow the IO, databend is improved for the cloud object storage, such as S3-like and Azure Blob cloud storage.
Yes, I actually do not realize this. But the problem seems not about storage engine, the performance bottleneck is about DFQueryResultWriter, a problem between mysql client and query server. Or if I do misunderstand?
Hmm, could you put the MySQL client query status, like:
select * from numbers(1000000) limit 1000000;
[result snippet]
1000000 rows in set (0.27 sec) -- this is all cost: query server + send to the client
Read 1000000 rows, 7.63 MiB in 0.009 sec -- this time is the query server to execute
MySQL DFQueryResultWriter is not a stream writer, for the query with limit 1000000 cost most is the send the result to client.
Hmm, could you put the MySQL client query status
1000000 rows in set (11.30 sec) Read 1000000 rows, 155.81 MiB in 1.199 sec., 833.69 thousand rows/sec., 129.89 MiB/sec.
Table customer contains many string columns -- StringColumn::get(&self, index: usize) costs much time. I think it's because [u8] to vec need do much copy operations and DFQueryResultWriter is not cache friendly.
I think it's because [u8] to vec need do much copy operations and DFQueryResultWriter is not cache friendly.
That's right. We can refactor DFQueryResultWriter into output MySQL format like Tsv/Csv/Parquet (https://github.com/datafuselabs/databend/blob/136196aba2127b6e62535d1336357d92b6e3a9eb/common/formats/src/output_format_parquet.rs).