Performance of Inserting to Object('JSON') is bad on variable JSON
CREATE TABLE default.prom_extend_json ON CLUSTER abc (
`timestamp` DateTime,
`value` Float64,
`__name__` String,
`labels` JSON
) ENGINE=MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (__name__, timestamp);
CREATE TABLE default.dist_prom_extend_json ON CLUSTER abc AS prom_extend_json ENGINE = Distributed(abc, default, prom_extend_json);
Inserting a batch with 16384 records costs 48.275 seconds on average.
The clickhouse-server side cpu usage is negligible.
The client side usage is 100%. Its pprof is attached.

Are you JSON objects consistent with mostly the same fields? insertEmptyColumn should only be called when a row has a new field not present in the batch.
Each my JSON object has 3 fields. There are 1000 different fields among all JSON objects.
The following code converts a fastjson.Value to a Go object which's passed to (*batch).Append later.
func val2map(v *fastjson.Value) (m map[string]interface{}) {
var err error
var obj *fastjson.Object
if obj, err = v.Object(); err != nil {
return
}
m = make(map[string]interface{}, obj.Len())
obj.Visit(func(key []byte, v *fastjson.Value) {
strKey := string(key)
switch v.Type() {
case fastjson.TypeString:
var vb []byte
if vb, err = v.StringBytes(); err != nil {
return
}
m[strKey] = string(vb)
case fastjson.TypeNumber:
var f float64
if f, err = v.Float64(); err != nil {
return
}
m[strKey] = f
}
})
return
}
I evaluated using 1000 Nullable(String) columns here.
So the challenge is the fact each append requires us to update the batch retrospectively - suppose we add rows 1, 2 and 3 with fields A B C. Adding row 4 with column D, requires the previous to be updated - since the protocol requires columns to be of the same length. This worsens as the batch gets bigger.
You could insert as string - defering the work to CLickHouse and not placing the load on the client. This code defn needs to be optimized and i suspected non-consistent/uniform JSON would be slow. I'll see if we can do this faster though.
The way to do this is probably to enhance the column format to be able to store backfill and forward fill positions - we could then use these during flush and encoding to save the work.