velox icon indicating copy to clipboard operation
velox copied to clipboard

Technical design decisions and ideas that are not present in ClickHouse

Open alexey-milovidov opened this issue 2 years ago • 5 comments

Please list some novel and interesting ideas that are not already present in ClickHouse.

alexey-milovidov avatar Jul 21 '22 08:07 alexey-milovidov

@alexey-milovidov Alexey, there is a paper about Velox in the upcoming VLDB.

Velox: Meta’s Unified Execution Engine - https://vldb.org/2022/?papers-industrial

mbasmanova avatar Jul 21 '22 11:07 mbasmanova

@mbasmanova there is no link to read the paper.

alexey-milovidov avatar Jul 23 '22 19:07 alexey-milovidov

@alexey-milovidov Alexey, I assume the paper will become available after the conference in a month or so. Would you like to join our Slack? Maybe we could provide a draft of the paper there. CC: @pedroerp @jijufb

mbasmanova avatar Jul 25 '22 14:07 mbasmanova

@alexey-milovidov if you send me an email at [email protected] I can share the current camera ready.

pedroerp avatar Jul 26 '22 00:07 pedroerp

@alexey-milovidov I have studied both clickhouse and velox project for a long time. From high level, velox is native unified execution engine which is a library(not managing data) which may accelerate upper application execution like Spark(intel/gluten) or Presto(c++ worker using velox), even Flink and ML. It provides lots of things in the doc: https://facebookincubator.github.io/velox/

While Clickhouse is world class analytics database, but not suitable for complex SQL for now(lacks exchange operator and CBO optimization). From low level, they both use modern computing techniques, like SIMD, batch/vector processing, expression compiling(velox not use LLVM IR for now), etc.

We use clickhouse in our company for OLAP business need and you know we also contribute lots of features back to community, and we also want to use and contribute spark+velox(like photon in databricks, see https://cs.stanford.edu/~matei/papers/2022/sigmod_photon.pdf) and presto+velox for performance and cost.

We are very much looking forward to that clickhouse can processing complex and big SQL which is now the bottleneck for widely use scope, but it seems you don't have the plan to enhance this feature, while apache impala, apache doris and starrocks have better distributed computing model.

zhanglistar avatar Aug 16 '22 03:08 zhanglistar

That's a great summary, thanks @zhanglistar . Converting this to a github Discussion.

pedroerp avatar Sep 16 '22 23:09 pedroerp