quickwit
quickwit copied to clipboard
Add an SSD cache
Right now, Quickwit only long term caches are:
- a cache use to store split footer (which contains the so-called hotcache)
- a cache to cache fast field data (a column-oriented representation of a field)
Both are extremely good candidate for caching as the same piece of data is often accessed for different queries. Right now those caches are in RAM only.
For fast field this is quite a problem, as they can be bulky. We would like to be able to cache fast fields (and possibly more pieces of data in the future) on an SSD.
More info
- The keys of this cache have the form
PathBuf, Range<usize>
where the range represents a range of bytes. R- The data is immutable.
Pitfalls
- The cache should be resilient to corruption. Validating a checksum on read might be wise.
- Killing the process in the middle of a download should not result in an invalid state (e.g. truncated value in cache) or a leak (some files needs to be removed manually)
- The number of file descriptor should be limited.
- Restarting should fast enough (tm).
- It is ok to assume only one process is writing into the cache (but a write lock should be added, right now there are none).
- A configured capacity in Bytes should be strictly respected.
- Ideally eviction/admission logic should be decoupled and stackable with the RAM cache.
- We do not trust actime/mtime/ctime
External pointers
There are a bunch of caches in qucikwit. The one discussed here is
- https://github.com/quickwit-oss/quickwit/blob/03d51183c8dc26b8b17afc3e082fd985917223e1/quickwit-storage/src/cache/quickwit_cache.rs
Right now it only applies to .fast files, and it should remain this way for this ticket.
It is relatively easy to add a new SSD based implmenetaiton of https://github.com/quickwit-oss/quickwit/blob/03d51183c8dc26b8b17afc3e082fd985917223e1/quickwit-storage/src/cache/mod.rs#L58-L66
.
Pointers
- https://github.com/pkhuong/kismet-cache/tree/main/src
- https://github.com/mozilla/sccache/tree/main/src/lru_disk_cache
Facebook has a comprehensive cache in c++ for many scenarios https://github.com/facebook/CacheLib
Putting that as backlog as our first SSD cache target as changed.