jaeger-clickhouse
jaeger-clickhouse copied to clipboard
[Feature]: Support Native JSON columns in Clickhouse
Requirement
As a Clickhouse analytics user, I want the clickhouse-jaeger
schema to allow using Clickhouse native JSON columns so that we can query data in clickhouse more efficiently (both in terms of performance and query simplicity)
Problem
Currently, Clickhouse-Jaeger stores JSON span data as a string column-type, which makes it quite verbose to have to query based on fields within the column using Clickhouse's JSON functions , especially if you get past 2 levels of nesting.
This is very evident, when you want to query the ingested data to generate your own analytics/insights. It would be nice if jaeear-clickhouse added support for Clickhouse native JSON columns
Proposal
A solution may be to start providing support for the native JSON datatype (It's still "experimental", but the spec has been quite stable for a while)
Open questions
The major open question is how this would affect the split between protobuf and json encoded data (currently, string supports both) and whether it'll add more complexities to the project. Need to observe more to see the impact of this, but wanted to raise this with the community/maintainers to get an idea of their thoughts.
@navinpai I agree from read perspective, reading a field from a JSON datatype is faster than reading a field from String datatype but from write perspective, inserting JSON columns are more expensive & hence slower than inserting String columns. And generally in metrics/logs/tracing system, we do far more writes than reads, so i feel string datatype is more appropriate. Feel free to correct me if my understanding is wrong.