ydb
ydb copied to clipboard
Any details available about columnshard?
Was reading the code and was wondering if there are any details about the "columnshard" and how it integrates into the big picture.
https://github.com/ydb-platform/ydb/tree/main/ydb/core/tx/columnshard
ColumnShard is an alternative for DataShard (https://github.com/ydb-platform/ydb/tree/main/ydb/core/tx/datashard) at tablet layer. It uses PARTITION BY HASH logic instead of PARTITION BY RANGE (with dynamic ranges and auto splits) used in tables based on DataShards. ColumnShard places its data by columns in sparse index using Apache Arrow primitives.
The main ColumnShard's target is OLAP / HTAP scenario. It supports near-OLTP insertion speed (with batching and deduplication by PK on server side). It also supports operations pushdown: functions, filters and aggregates. DataShard reads data by rows, filters by PK predicate and makes resulting projection. ColumnShard read data by columns, allows to generate columns, filter it by predicates (not only by PK) and thus allows to return less data to compute layer for future processing.
For now there're restrictions for ColumnShard's usage:
- DDL operations for general tables over ColumnShards are in progress. You're able to create them using LogStore iface only (https://github.com/ydb-platform/ydb/blob/main/ydb/public/api/grpc/draft/ydb_logstore_v1.proto)
- There's no SQL to insert data into them. You could load data via ydb cli import file or via BulkUpsert grpc call
- You have to use scan query to SELECT from them
- You have to use Timestamp column as first PK column
- You cannot use such tables in general YDB transactions
We're going to remove most of the restrictions in near future and describe them in documentation then.