clickhouse-activerecord icon indicating copy to clipboard operation
clickhouse-activerecord copied to clipboard

Complex type support

Open leboshi opened this issue 4 months ago • 0 comments

More of an RFC than an issue report... We have a branch with simple Tuple support for groupings of basic types, like Tuple(String, UInt64). Our actual use case involves Tuples that include Arrays, though, so we're working on more sophisticated type handling. We were wondering if you had some thoughts on design, @PNixx.

Problem

The first problem is that the regex matches for %r(Array), %r(Map), and now %r(Tuple) just return the first match from the TypeMap. Depending on the order types are registered, an Array(Tuple(...)) might return a Tuple instead of an Array as the top-level type. And even if it does correctly detect that an Array should be the top-level type, there are other type detection problems because of how Array detects subtypes: if the type is Array(Tuple(String, UInt64)), it would currently be treated as Array(UInt64).

Proposal

What we're leaning toward is a full AST grammar for type definitions. This would let us correctly handle all the composition.

One question we have is how much validation you think it makes sense to do in Ruby: Nullable(Array(...)) is valid syntax, but an illegal type definition. In the interest of keeping the gem (relatively) lightweight and not duplicating all the logic in the server, we think it makes sense to only worry about syntax parsing in the gem and leave the validity of type definitions to the server.

AST-based type parsing would require a custom TypeMap implementation too, to let us do things like register_decorator for Nullable, LowCardinality, etc. Does that make sense to you as an API? We wouldn't want to actually build out support for all the other types yet, but we'd want to make the TypeMap flexible enough to implement them later.

Breaking Changes

We've already run into things that have previously been working by accident. For example, we have a column order_ids AggregateFunction(groupUniqArray, UInt64). With the current implementation, this hits the %r(Array) regex, the Array type detects the UInt64 in the type string, and the adapter registers the column type as an array of UInt64. We've been querying SELECT groupUniqArrayMerge(order_ids) AS order_ids, using the original column name for convenience. The adapter casts the values correctly as Array(UInt64), but only by accident.

(Also unexpected in this specific case is that the adapter has no problem pulling in individual records, ungrouped, with the binary data in the order_ids column because it thinks the column is just an Array(UInt64). However, because of the way Array deserialization is currently written to handle its own recursion with if value.is_a?(::Array), it doesn't care that the value from the database isn't an array and just returns the raw binary string .to_i, which usually ends up being 0, not even [0].)

leboshi avatar Aug 15 '25 18:08 leboshi