quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Add an SSD cache

Open fulmicoton opened this issue 1 year ago • 1 comments

Right now, Quickwit only long term caches are:

  • a cache use to store split footer (which contains the so-called hotcache)
  • a cache to cache fast field data (a column-oriented representation of a field)

Both are extremely good candidate for caching as the same piece of data is often accessed for different queries. Right now those caches are in RAM only.

For fast field this is quite a problem, as they can be bulky. We would like to be able to cache fast fields (and possibly more pieces of data in the future) on an SSD.

More info

  • The keys of this cache have the form PathBuf, Range<usize> where the range represents a range of bytes. R- The data is immutable.

Pitfalls

  • The cache should be resilient to corruption. Validating a checksum on read might be wise.
  • Killing the process in the middle of a download should not result in an invalid state (e.g. truncated value in cache) or a leak (some files needs to be removed manually)
  • The number of file descriptor should be limited.
  • Restarting should fast enough (tm).
  • It is ok to assume only one process is writing into the cache (but a write lock should be added, right now there are none).
  • A configured capacity in Bytes should be strictly respected.
  • Ideally eviction/admission logic should be decoupled and stackable with the RAM cache.
  • We do not trust actime/mtime/ctime

External pointers

There are a bunch of caches in qucikwit. The one discussed here is

  • https://github.com/quickwit-oss/quickwit/blob/03d51183c8dc26b8b17afc3e082fd985917223e1/quickwit-storage/src/cache/quickwit_cache.rs

Right now it only applies to .fast files, and it should remain this way for this ticket. It is relatively easy to add a new SSD based implmenetaiton of https://github.com/quickwit-oss/quickwit/blob/03d51183c8dc26b8b17afc3e082fd985917223e1/quickwit-storage/src/cache/mod.rs#L58-L66.

Pointers

  • https://github.com/pkhuong/kismet-cache/tree/main/src
  • https://github.com/mozilla/sccache/tree/main/src/lru_disk_cache

fulmicoton avatar Jul 12 '22 03:07 fulmicoton

Facebook has a comprehensive cache in c++ for many scenarios https://github.com/facebook/CacheLib

ddorian avatar Aug 02 '22 13:08 ddorian

Putting that as backlog as our first SSD cache target as changed.

fulmicoton avatar May 30 '23 13:05 fulmicoton