ydb icon indicating copy to clipboard operation
ydb copied to clipboard

Any details available about columnshard?

Open nvartolomei opened this issue 2 years ago • 1 comments

Was reading the code and was wondering if there are any details about the "columnshard" and how it integrates into the big picture.

https://github.com/ydb-platform/ydb/tree/main/ydb/core/tx/columnshard

nvartolomei avatar Aug 11 '22 19:08 nvartolomei

ColumnShard is an alternative for DataShard (https://github.com/ydb-platform/ydb/tree/main/ydb/core/tx/datashard) at tablet layer. It uses PARTITION BY HASH logic instead of PARTITION BY RANGE (with dynamic ranges and auto splits) used in tables based on DataShards. ColumnShard places its data by columns in sparse index using Apache Arrow primitives.

The main ColumnShard's target is OLAP / HTAP scenario. It supports near-OLTP insertion speed (with batching and deduplication by PK on server side). It also supports operations pushdown: functions, filters and aggregates. DataShard reads data by rows, filters by PK predicate and makes resulting projection. ColumnShard read data by columns, allows to generate columns, filter it by predicates (not only by PK) and thus allows to return less data to compute layer for future processing.

For now there're restrictions for ColumnShard's usage:

  • DDL operations for general tables over ColumnShards are in progress. You're able to create them using LogStore iface only (https://github.com/ydb-platform/ydb/blob/main/ydb/public/api/grpc/draft/ydb_logstore_v1.proto)
  • There's no SQL to insert data into them. You could load data via ydb cli import file or via BulkUpsert grpc call
  • You have to use scan query to SELECT from them
  • You have to use Timestamp column as first PK column
  • You cannot use such tables in general YDB transactions

We're going to remove most of the restrictions in near future and describe them in documentation then.

4ertus2 avatar Sep 01 '22 15:09 4ertus2