pyroscope icon indicating copy to clipboard operation
pyroscope copied to clipboard

Store-gateway memory usage

Open kolesnikovae opened this issue 1 year ago • 1 comments

In a large scale deployment, store-gateway service is prone to OOM issues due to the way we handle TSDB indices: once block is opened, its TSDB index is loaded in memory and never released until shutdown.

image

The issue is aggravated by the fact that the index:

  • is not sharded: all series of a tenant block are stored in a single blob
  • is not optimized for reading from the object storage: the whole index is to be loaded into memory before access

This makes it hard to cache the index efficiently: even if we pulled it from some very fast cache, we'd still need to have enough memory to keep indexes affected by the query in memory

kolesnikovae avatar Mar 19 '24 07:03 kolesnikovae

I do think we can copy vast amounts of mimir's block lazy loading:

https://grafana.com/docs/mimir/latest/references/architecture/binary-index-header/ https://github.com/grafana/mimir/tree/main/pkg/storegateway/indexheader

There are some design documents, but they are internal only (sorry no GL employees):

https://docs.google.com/document/d/1fFYJzYslSSui8Nswfkh6Ez1hKwRIHkqJTwMXb8v3zyk/edit#heading=h.e7syi5b4uppu https://docs.google.com/document/d/1N9XpyOD6x9Y6XuXl80wM7Wy6gjZvlEsznJxuP3JiQ30/edit#heading=h.l6q8kfbn269o

simonswine avatar Mar 21 '24 10:03 simonswine