clickhouse.rs icon indicating copy to clipboard operation
clickhouse.rs copied to clipboard

Use `RowBinaryWithNamesAndTypes`

Open loyd opened this issue 3 years ago • 4 comments

Use RowBinaryWithNamesAndTypes instead of RowBinary as a primary format in order to support deserialize_any and, hence, serde(flatten) and type conversions & validation.

loyd avatar Apr 20 '21 07:04 loyd

Hi! Any progress with this? May be you need a beta-tester? :)

Looks like a very important enhancement. Sometimes I get NotEnoughData on a query that works with another time bind argument, and works in clickhouse-client without any issues. And not a hint why.

makorne avatar Oct 01 '21 05:10 makorne

@loyd do you have a branch for the RowBinaryWithNamesAndTypes work you can share? Happy to help contribute, but also don't want to start-from-scratch if you're already close :-)

wspeirs avatar Oct 18 '23 18:10 wspeirs

@loyd this is a super-hack (but min number of changes) that can get us cell-by-cell fetching of data. I see a few issues with this approach (besides the hackiness, but some of it can be cleaned up):

  • The ?fields marker for the SELECT statement is no longer supported because there is no type to get the fields from. I don't think this is a hug deal, but open to feedback.
  • If you attempt to get a cell by the wrong type, it could start reading data from the next cell, or return a "not enough data" error. I don't think this is a huge deal either, because the same thing happens if you specify the wrong type now.

Going with RowBinaryWithNamesAndTypes is going to be tough because we'd have to translate the string names to the actual types... not impossible, but annoying.

I welcome any/all feedback... thanks!

wspeirs avatar Oct 18 '23 20:10 wspeirs

Providing support for deserializing different rows into different types is the foot gun; CH always returns specific data shapes without any variation.

Moving to RowBinaryWithNamesAndTypes is an expected improvement not only for supporting deserialize_any, but, first of all, for validation purposes to prevent a schema violation (e.g., #100).

It's not a problem to replace RowBinary with RowBinaryWithNamesAndTypes, but we need to decide in which cases is conversion is allowed and when an error should rise.

Support for SELECTs and INSERTs is different. During INSERT, the conversion is performed on the CH side. Moreover, the behavior is not specified directly, only

If setting input_format_with_types_use_header is set to 1, the types from input data will be compared with the types of the corresponding columns from the table. Otherwise, the second row will be skipped.

So, I need to check the actual behavior (and maybe check other libraries).

On the SELECTs, the conversion is performed on the client side and we have more control over it.

Also, the behavior should remain same in case of moving to TCP+Native.

loyd avatar Jan 27 '24 15:01 loyd