cortex
cortex copied to clipboard
Send strings once per Push call
Each timeseries sent from distributor to ingester in a Push()
call has a set of label-value pairs, which are strings that tend to repeat a lot because there will be a lot of metrics in the same "family".
Instead, we could send a table of strings then encode the label-value set as indexes into that table.
The benefits would be less garbage in the ingester and less network traffic, at the cost of extra work on send. Currently we turn on gRPC gzip compression to save network, so this idea might replace that and is probably cheaper.
If this is a good idea, we could do the same in Prometheus remote_write
.
Hi, is this still a valid issue? I'd like to take this one.
Yes, still valid.
I suspect it will take several iterations to find the best balance, also you probably have to do some groundwork on a some repeatable test cases and benchmarks so we can be sure about the benefit. However, even a little bit of investigation or prototyping is probably useful - please let us know what you find.
Hi @ChinYing-Li , is there any progress for this issue?
Hi @xiaobeiyang , I haven't started working on this, so feel free to pick it up if you wish.
Funny that I thought Exacly the same but for other component.. I see that most of the querier cpu and memory is being used to deserialize labels when getting data from ingester and/or store gateway .. so I thought why not send first the symbols and after just references ??? It may help something