cratedb-prometheus-adapter
cratedb-prometheus-adapter copied to clipboard
Improve storage by normalizing the current metrics table
As pointed out by @proddata in some private conversations, we could further reduce the storage usage by normalizing how we store the data in CrateDB.
I did some testing on my local prometheus setup and we currently we store the same information, label_set many times, e.g:
SELECT
count(*)
FROM
"doc"."metrics"
WHERE
labels_hash = '478d4639912fc742' LIMIT 10
-- 36_558
In my 33M dataset, this particular label hash is written 36k times, that means that we are writing (in this specific case) the following object, (36.000 - 1) times unnecessarily.
{
instance: "some_domain"
__name__: "probe_http_content_length"
job: "blackbox"
}
Also, in my test dataset, there are 939 unique label hashes, so potentially we could stop writing objects like that (33x10^6-939) times.