clickhouse.rs
clickhouse.rs copied to clipboard
Support for native TCP protocol
Hi! Any ETA on native TCP support? This is a key feature for many production workloads where performance and full protocol support are critical.
There are the following pre-requisites:
- #221 - RBWNAT with selects, it also adds the column name/type header parser which is a huge part of Native as well, and it can be fully reused.
- RBWNAT with insert
- Native over HTTP
Having all of that, we can then add optional Native over TCP while keeping HTTP transport. Can't say regarding ETA, but we actively looking into it.
Let's keep this issue as a feature request.
and full protocol support are critical.
@laruh what TCP protocol features do you consider critical for your case?
and full protocol support are critical.
@laruh what TCP protocol features do you consider critical for your case?
Hello!
We are evaluating a migration from Go to Rust for our several high-load services.
So current stack relies on the Go ClickHouse client, and we actively use:
- Data-packet streaming – columnar bulk INSERTs via the Batch API and streaming SELECTs.
- LZ4 by default (haven’t needed ZSTD yet, but it’s good to keep the option).
- ClientInfo to tag every query with service metadata.
- Cancel packets
- Ping / Pong
- In addition to simple types, we also rely on complex (semi-structured) column types such as
LowCardinality,JSON,Map; full type coverage is preferable of course.
This gap is currently a blocker to adopting Rust in the projects.
Nice-to-have, but eventually important
- Query packet Settings.
- Realtime Progress packets and ProfileInfo summary.
- Tuning knobs for the connection pool (idle limits, open strategy, etc)
Data-packet streaming – columnar bulk INSERTs via the Batch API and streaming SELECTs.
Is your data coming in a columnar format already, or row-by-row?
LZ4 by default (haven’t needed ZSTD yet, but it’s good to keep the option).
LZ4 is enabled by default in this crate.
ClientInfo to tag every query with service metadata.
Looks like you could use Client::with_product_info.
Cancel packets
For select operations, you can just use cancel_http_readonly_queries_on_client_close, and just cancel the future from the code.
For other queries, it should be possible to work around that with KILL QUERY WHERE query_id = ...
Ping / Pong
While client does not expose the ping method that uses the default /ping endpoint, perhaps you could just use something like SELECT 1 for healthchecks?
In addition to simple types, we also rely on complex (semi-structured) column types such as LowCardinality, JSON, Map; full type coverage is preferable of course.
There are examples for all the supported types:
- https://github.com/ClickHouse/clickhouse-rs/blob/main/examples/data_types_derive_simple.rs
- https://github.com/ClickHouse/clickhouse-rs/blob/main/examples/data_types_derive_containers.rs
- https://github.com/ClickHouse/clickhouse-rs/blob/main/examples/data_types_new_json.rs
- https://github.com/ClickHouse/clickhouse-rs/blob/main/examples/data_types_variant.rs
This gap is currently a blocker to adopting Rust in the projects. Nice-to-have, but eventually important Query packet Settings.
Are we talking about ClickHouse settings? Then you could just use Client::with_option method (here 's a simple example).
Realtime Progress packets and ProfileInfo summary.
This one cannot be simulated, sadly.
Tuning knobs for the connection pool (idle limits, open strategy, etc)
Should be possible with a custom Hyper instance https://github.com/ClickHouse/clickhouse-rs/blob/main/examples/custom_http_client.rs#L12-L19
Is your data coming in a columnar format already, or row-by-row?
https://pkg.go.dev/github.com/ClickHouse/clickhouse-go/v2/lib/[email protected]#Batch
We buffer rows with the Batch API (AppendStruct / Append). When Send() is called the Go driver converts those rows to a column-oriented Data packet then sends it over tcp port 9000/9440. So the insert path is truly columnar, even if the application feeds rows.
https://clickhouse.com/docs/best-practices/selecting-an-insert-strategy#choose-the-right-format
While flexibility is useful for data engineering and file-based imports, applications should prioritize performance-oriented formats:
- Native format (recommended): Most efficient. Column-oriented, minimal parsing required server-side. Used by default in Go and Python clients.
- RowBinary: Efficient row-based format, ideal if columnar transformation is hard client-side. Used by the Java client.
Are we talking about ClickHouse settings?
The original question was "what TCP protocol features do you consider critical for your case?"
Most of the items in my list referred to TCP-level packets https://clickhouse.com/docs/native-protocol/basics (Data, Query, Progress, ProfileInfo, Cancel, Ping/Pong)
Note: because performance is a strict requirement, we chose the Native TCP protocol over HTTP.
We buffer rows with the Batch API (AppendStruct / Append). When Send() is called the Go driver converts those rows to a column-oriented Data packet then sends it over tcp port 9000/9440. So the insert path is truly columnar, even if the application feeds rows.
Please note that in that case the load of converting a row-oriented to a column-oriented format is shifted to your backend application. It is a matter of preference where you prefer to have higher resource consumption - on the app side, or on the CH side. But you should measure overall e2e latency of an insert.
Native format (recommended): Most efficient. Column-oriented, minimal parsing required server-side.
It is most efficient for the ClickHouse server. On the app side, it might imply higher resources consumption, since e.g. a single Vec<T> needs to be translated into Vec<T.0>, Vec<T.1>, Vec<T.N>, i.e. a separate collection for each column of your data model, so it can be put into the input blocks appropriately.
Note: because performance is a strict requirement, we chose the Native TCP protocol over HTTP.
Let's not confuse TCP protocol/interface and Native data format. Both are historically called Native and that can cause confusion.
Regarding interfaces, there is very insignificant difference in HTTP vs TCP interface, as seen in the Python drivers benchmarks, where both use Native format.
I'd advice to do a simple PoC with the current version of the Rust client and see if performance matches your expectations, as I am fairly certain it will. If you have any questions, please feel free to reach out in the community Slack as well.
As for the TCP protocol feature itself, as I said, we are actively looking into it within the team.
Regarding interfaces, there is very insignificant difference in HTTP vs TCP interface, as seen in the Python drivers benchmarks, where both use Native format.
Only one note, the benchmark measures SELECT performance. It doesn’t include results for bulk INSERT operations.
upd: If/when we have PoC either with the current HTTP client or a future TCP version, I will update you on Slack, yes.