clickhouse.rs
clickhouse.rs copied to clipboard
Insert data with dynamic columns
I'm writing a kafka consumer, it consumes message and insert those message into clickhouse according to message's meta info, like:
{
"table": "ch_table_1",
"data": [
{"col_name": "foo", "type": "uint32", "val": "3"},
{"col_name": "bar", "type": "string", "val": "hello"}
//...
]
}
How do I construct the Row to insert? Didn't find any docs about this
Columns can change, innit? In this case, there is no way to do it, because the crate is about typed struct. For dynamic types you can use https://github.com/suharev7/clickhouse-rs
Thx
Hmm, adjacent to this: will PRs for dynamic columns be considered?
We're inserting types with custom Serialize implementations precisely because we have a set of columns that are only known at runtime.
(The same PR also contains work for rewinding the send buffer on serialization failures).
Thanks,
Shenghao
@shenghaoyang, theoretically, Row::COLUMN_NAMES can be replaced with fn columns() -> Vec<Cow<'static, str>> or even -> Cow<'static, [Cow<'static, str>]>.
I would like to avoid stabilizing this, but if it is helpful for your case, I'm ready to consider such changes after discussion.
Hey - thanks for the reply.
Our patch was rather simple (I've not used Cow much, never even realized you could do that) - we added a new Insert::new_dyn() function that accepts IntoIterator<&str>, and uses that to construct an instance of Insert, escaping the table names and column names along the way.
We've not routed both Insert::new() and Insert::new_dyn() through a common implementation yet, but that should be possible - though in that case we'd probably lose the compile time evaluation of join_column_names()?
Our usage scenario is forwaring metrics data from collectors (think Telegraf, Prometheus Node Exporter, etc.) into ClickHouse, batching up rows with identical schemas and then sending them off in a large batch for efficiency.
We don't know the schema of data that will be received ahead of time - and can't implement Row - the schema of the ClickHouse tables are also unknown at compile time, which makes this extra fun 😄. We're basically building a Wrapper implementing Serialize on the fly that can be passed into write() - something like write(Wrapper::new(&row, &how_to_transform_metric_row_to_table_row)).
We also needed to implement unwinding Insert::write()s because we perform data type conversions for each row during the serialization process (to avoid buffering, scanning the row twice, etc), and some of them can fail due to things like range errors - cancelling the insert of a batch for 1/2 rows is a bit wasteful...
Maybe it'd be better if I submit a PR and we can discuss there?
Thanks,
Shenghao