clickhouse.rs
clickhouse.rs copied to clipboard
Use `RowBinaryWithNamesAndTypes`
Use RowBinaryWithNamesAndTypes
instead of RowBinary
as a primary format in order to support deserialize_any
and, hence, serde(flatten)
and type conversions & validation.
Hi! Any progress with this? May be you need a beta-tester? :)
Looks like a very important enhancement. Sometimes I get NotEnoughData on a query that works with another time bind argument, and works in clickhouse-client without any issues. And not a hint why.
@loyd do you have a branch for the RowBinaryWithNamesAndTypes
work you can share? Happy to help contribute, but also don't want to start-from-scratch if you're already close :-)
@loyd this is a super-hack (but min number of changes) that can get us cell-by-cell fetching of data. I see a few issues with this approach (besides the hackiness, but some of it can be cleaned up):
- The
?fields
marker for theSELECT
statement is no longer supported because there is no type to get the fields from. I don't think this is a hug deal, but open to feedback. - If you attempt to get a cell by the wrong type, it could start reading data from the next cell, or return a "not enough data" error. I don't think this is a huge deal either, because the same thing happens if you specify the wrong type now.
Going with RowBinaryWithNamesAndTypes
is going to be tough because we'd have to translate the string names to the actual types... not impossible, but annoying.
I welcome any/all feedback... thanks!
Providing support for deserializing different rows into different types is the foot gun; CH always returns specific data shapes without any variation.
Moving to RowBinaryWithNamesAndTypes
is an expected improvement not only for supporting deserialize_any,
but, first of all, for validation purposes to prevent a schema violation (e.g., #100).
It's not a problem to replace RowBinary
with RowBinaryWithNamesAndTypes
, but we need to decide in which cases is conversion is allowed and when an error should rise.
Support for SELECT
s and INSERT
s is different. During INSERT
, the conversion is performed on the CH side. Moreover, the behavior is not specified directly, only
If setting input_format_with_types_use_header is set to 1, the types from input data will be compared with the types of the corresponding columns from the table. Otherwise, the second row will be skipped.
So, I need to check the actual behavior (and maybe check other libraries).
On the SELECT
s, the conversion is performed on the client side and we have more control over it.
Also, the behavior should remain same in case of moving to TCP+Native
.