OpenSearch
OpenSearch copied to clipboard
[BUG] The `/<index>/_upgrade` API can produce segments larger than max segment size
Describe the bug
I was talking with @mch2 yesterday, saying "Hey -- we should add a dedicated API, or maybe just a flag to force_merge
, to rewrite old segments using the latest codec after doing an upgrade." I'm glad I talked with him before creating an issue for it, because he told me that it already exists. Great!
I was curious about how it works, so I dove through the call to InternalEngine.forceMerge()
, where it passes the upgrade = true
parameter, which it sets on the OpenSearchMergePolicy
, and then this logic gets called to do the actual segment rewriting. Cool.
Except, it's not cool at all, because, if there's nothing to upgrade (because all the segments have already been (re)written using the latest codec), it falls back to the underlying merge policy's force merge logic, which will compute merges with unbounded max segment size.
Overall, the real problem is OpenSearchTieredMergePolicy
and that annoying forceMergePolicy
that ignores max segment size. While we need it in order to maintain the existing behavior when merging down to a single segment (e.g. before hot-to-warm migration), IMO we should not use forceMergePolicy
unless we're explicitly trying to merge down to a small number of segments. Specifically, I would say that we should only use forceMergePolicy
if total_shard_size / maxSegmentCount > maxSegmentSizeMB
-- that is, whenever we can merge down to maxSegmentCount
segments without exceeding the max segment size, then we should honor the max segment size.
Related component
Indexing
To Reproduce
- Create an index with 1 shard, 0 replicas (for simplicity), called
my-index
, with a merge policy that sets max segment size to 500MB. - Index 5GB of data.
- Run
POST /my-index/_upgrade
- Run
GET /_cat/segments
- Maybe repeat steps 3 and 4 a few times.
- Notice that (eventually), a segment with size greater than 500MB gets written.
Expected behavior
I would expect max segment size to be honored except when I force merge to a small number of segments.
Additional Details
Plugins N/A
Screenshots N/A
Host/Environment (please complete the following information):
- OS: My brain reading through code
- Version: Way too old
Additional context N/A
Whenever we can merge down to maxSegmentCount segments without exceeding the max segment size, then we should honor the max segment size.
Makes sense to me meaning max segment count would be honored over max size. This would cover any force_merge not just on upgrade calls right?
Also this api does not appear to be documented at index or cluster lvl at https://opensearch.org/docs/latest/api-reference/index-apis/index/.