jaeger-clickhouse icon indicating copy to clipboard operation
jaeger-clickhouse copied to clipboard

[Feature]: Support Native JSON columns in Clickhouse

Open navinpai opened this issue 1 year ago • 1 comments

Requirement

As a Clickhouse analytics user, I want the clickhouse-jaeger schema to allow using Clickhouse native JSON columns so that we can query data in clickhouse more efficiently (both in terms of performance and query simplicity)

Problem

Currently, Clickhouse-Jaeger stores JSON span data as a string column-type, which makes it quite verbose to have to query based on fields within the column using Clickhouse's JSON functions , especially if you get past 2 levels of nesting.

This is very evident, when you want to query the ingested data to generate your own analytics/insights. It would be nice if jaeear-clickhouse added support for Clickhouse native JSON columns

Proposal

A solution may be to start providing support for the native JSON datatype (It's still "experimental", but the spec has been quite stable for a while)

Open questions

The major open question is how this would affect the split between protobuf and json encoded data (currently, string supports both) and whether it'll add more complexities to the project. Need to observe more to see the impact of this, but wanted to raise this with the community/maintainers to get an idea of their thoughts.

navinpai avatar Jul 14 '22 12:07 navinpai

@navinpai I agree from read perspective, reading a field from a JSON datatype is faster than reading a field from String datatype but from write perspective, inserting JSON columns are more expensive & hence slower than inserting String columns. And generally in metrics/logs/tracing system, we do far more writes than reads, so i feel string datatype is more appropriate. Feel free to correct me if my understanding is wrong.

chhetripradeep avatar Nov 27 '22 10:11 chhetripradeep