OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

[BUG] The `/<index>/_upgrade` API can produce segments larger than max segment size

Open msfroh opened this issue 9 months ago • 2 comments

Describe the bug

I was talking with @mch2 yesterday, saying "Hey -- we should add a dedicated API, or maybe just a flag to force_merge, to rewrite old segments using the latest codec after doing an upgrade." I'm glad I talked with him before creating an issue for it, because he told me that it already exists. Great!

I was curious about how it works, so I dove through the call to InternalEngine.forceMerge(), where it passes the upgrade = true parameter, which it sets on the OpenSearchMergePolicy, and then this logic gets called to do the actual segment rewriting. Cool.

Except, it's not cool at all, because, if there's nothing to upgrade (because all the segments have already been (re)written using the latest codec), it falls back to the underlying merge policy's force merge logic, which will compute merges with unbounded max segment size.

Overall, the real problem is OpenSearchTieredMergePolicy and that annoying forceMergePolicy that ignores max segment size. While we need it in order to maintain the existing behavior when merging down to a single segment (e.g. before hot-to-warm migration), IMO we should not use forceMergePolicy unless we're explicitly trying to merge down to a small number of segments. Specifically, I would say that we should only use forceMergePolicy if total_shard_size / maxSegmentCount > maxSegmentSizeMB -- that is, whenever we can merge down to maxSegmentCount segments without exceeding the max segment size, then we should honor the max segment size.

Related component

Indexing

To Reproduce

  1. Create an index with 1 shard, 0 replicas (for simplicity), called my-index, with a merge policy that sets max segment size to 500MB.
  2. Index 5GB of data.
  3. Run POST /my-index/_upgrade
  4. Run GET /_cat/segments
  5. Maybe repeat steps 3 and 4 a few times.
  6. Notice that (eventually), a segment with size greater than 500MB gets written.

Expected behavior

I would expect max segment size to be honored except when I force merge to a small number of segments.

Additional Details

Plugins N/A

Screenshots N/A

Host/Environment (please complete the following information):

  • OS: My brain reading through code
  • Version: Way too old

Additional context N/A

msfroh avatar May 07 '24 00:05 msfroh

Whenever we can merge down to maxSegmentCount segments without exceeding the max segment size, then we should honor the max segment size.

Makes sense to me meaning max segment count would be honored over max size. This would cover any force_merge not just on upgrade calls right?

Also this api does not appear to be documented at index or cluster lvl at https://opensearch.org/docs/latest/api-reference/index-apis/index/.

mch2 avatar May 07 '24 00:05 mch2

[Triage - attendees 1 2 3 4] @msfroh Thanks for filing.

andrross avatar May 08 '24 15:05 andrross