OpenSearch
OpenSearch copied to clipboard
Setting a limit for file cache capacity
Is your feature request related to a problem? Please describe
Following up to #14004 . Currently we use the total space of file cache path to verify the user-defined file cache size setting.
https://github.com/opensearch-project/OpenSearch/blob/c71fd4a2f2e6d5d7d9f2f304c573180027af8f44/server/src/main/java/org/opensearch/node/Node.java#L2033
For the cache scenario, it seems to be not reasonable. The file cache will try to evict blocks when the watermark is higher than user-defined size. At the same time, the query task may need other new blocks that are located at remote storage, if the tolerance space( totalSpace - fileCacheSize
) is very small, there will be no free space for caching new blocks and the query task will fail.
Describe the solution you'd like
There are some options:
- the user-defined file cache size must be less than a specific percentage of total space (hard code), like 95%;
- the tolerance size must be larger than a specific size (hard code), like 50G;
- add another tolerance size setting(non-dynamic, can be set as byte size or percentage).
Related component
Storage:Snapshots
Describe alternatives you've considered
Treats the current file cache capacity as the maximum size allowed, and introduces another setting evict_watermark
that is the percentage of the file cache capacity. When the total size of cache entries occupies a proportion of the cache capacity that exceeds the preset watermark, the file cache begins to attempt to evict entries. When a new block needs to be cached and there is no free space in the file cache, we can fail the corresponding query or we can use an on-heap memory block to hold this file block and to serve the query.
In this way, the behavior of the file cache may become more predictable, especially when the search node is deployed with other node roles on the same node, to ensure that the file cache does not encroach on disk space for different purposes, which then affects the normal operation of the node. And when implementing a writable warm index, we may also need to set a file cache along with a local directory, if a sudden large number of queries or a big query causes the file cache to take up all the available space, it can cause writes to fail.
Additional context
No response