cratedb-prometheus-adapter icon indicating copy to clipboard operation
cratedb-prometheus-adapter copied to clipboard

Improve storage by normalizing the current metrics table

Open surister opened this issue 1 year ago • 7 comments

As pointed out by @proddata in some private conversations, we could further reduce the storage usage by normalizing how we store the data in CrateDB.

I did some testing on my local prometheus setup and we currently we store the same information, label_set many times, e.g:

SELECT
  count(*)
FROM
  "doc"."metrics"
WHERE
  labels_hash = '478d4639912fc742' LIMIT 10
-- 36_558

In my 33M dataset, this particular label hash is written 36k times, that means that we are writing (in this specific case) the following object, (36.000 - 1) times unnecessarily.

{
  instance: "some_domain"
  __name__: "probe_http_content_length"
  job: "blackbox"
}

Also, in my test dataset, there are 939 unique label hashes, so potentially we could stop writing objects like that (33x10^6-939) times.

surister avatar Oct 14 '24 15:10 surister