OpenSearch TermsAggregatorTests flaky test fix

Description

Adding documents in a single chunk in unit test to avoid multiple segments in the unit test.

The following assertion was failing:

                    assertEquals(expectedCollectCount, aggregator.getCollectCount().get());

The number of collect() calls is used to check whether the optimization has kicked in or not. The tests written with tight assertions assuminga single segment.

In local development, running this test in isolation, it is not feasible to get the test failing as it might be creating a single segment. However, in gradle CI (see linked issue) and running the test with other tests, it may happen that multiple segments are created (multiple commits after individual document addition), which is the only explanation why the collect() count assertion is failing case.

So inspired from https://github.com/opensearch-project/OpenSearch/blob/main/server/src/test/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregatorTests.java#L1655-L1668 editing the test to add all documents in a single call to avoid committing changes to index and avoid multiple segments.

Did not add changelog as this is a flaky test fix.

Related Issues

Resolves https://github.com/opensearch-project/OpenSearch/issues/12640

Check List

[x] New functionality includes testing.
- [x] All tests pass
[x] New functionality has been documented.
- [x] New functionality has javadoc added
[x] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
[x] Commits are signed per the DCO using --signoff
[x] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
[x] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

May 06 '24 17:05 sandeshkr419

:x: Gradle check result for a358230d2f3fe612083b467298e8f4dc748ab2b1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

May 06 '24 17:05 github-actions[bot]

:white_check_mark: Gradle check result for 6b95eeb2ad5d1a663c8201f310453d3d4af3d3f4: SUCCESS

May 06 '24 18:05 github-actions[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 71.60%. Comparing base (b15cb0c) to head (6b95eeb). Report is 289 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #13567      +/-   ##
============================================
+ Coverage     71.42%   71.60%   +0.18%     
- Complexity    59978    61094    +1116     
============================================
  Files          4985     5052      +67     
  Lines        282275   287126    +4851     
  Branches      40946    41604     +658     
============================================
+ Hits         201603   205594    +3991     
- Misses        63999    64595     +596     
- Partials      16673    16937     +264

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

May 06 '24 18:05 codecov[bot]

Looks like SpotLess check failed. Can you take a look?

May 06 '24 21:05 kkmr

While you are at it, could also add some info about the validation you did for the change?

May 06 '24 21:05 kkmr

Adding documents in a single chunk in unit test to avoid multiple segments in the unit test.

Is the test dependent on the number of segments? Is that the assertion failing in the original bug?

May 06 '24 23:05 jainankitk

If the test depends on a specific number of segments then can we [1] add an assertion for the segment count and [2] force merge to the correct number of segments within the test?

Not sure about the viability of [2] since I see a test path for deleted docs.

May 07 '24 18:05 jed326

@jed326

If the test depends on a specific number of segments then can we [1] add an assertion for the segment count and [2] force merge to the correct number of segments within the test?

[1] No its not feasible to add assertion of segment count - I could not find a utility to do so in our codebase. Introducing such utility may not be within the scope of this change. [2] Adding documents in a single call (addDocuments()) ensures commits are not happening in between multiple calls (addDocument()).

@jainankitk @kkmr

Added more details in description now.

Tests are not failing locally (most likely running in silo). But looking at failures, it looks like if we can force the documents are added in a single call, it will result in a single segment like the one in DateHistogramTest given we already have no merge policy in place during index creation. The failures in flaky tests reported in different CI calls can only occur when there are multiple segments so this is more of a calculated change that we can do to solve the flakiness with no side-effects.

May 08 '24 05:05 sandeshkr419

The failures in flaky tests reported in different CI calls can only occur when there are multiple segments so this is more of a calculated change that we can do to solve the flakiness with no side-effects.

Can you try to run the test locally until failure with and without this change? And report back the number of successful runs without failure?

May 08 '24 16:05 jainankitk

The failures in flaky tests reported in different CI calls can only occur when there are multiple segments so this is more of a calculated change that we can do to solve the flakiness with no side-effects.

Can you try to run the test locally until failure with and without this change? And report back the number of successful runs without failure?

So I have iterated through the test 250 times in a loop for both scenarios (with & without my change) with 0 failures in each case. So still no luck reproducing it in present way.

May 08 '24 20:05 sandeshkr419

So I have iterated through the test 250 times in a loop for both scenarios (with & without my change) with 0 failures in each case. So still no luck reproducing it in present way.

Can you explicitly issue IndexWriter.commit to create multiple segments and confirm the issue? Just want to ensure that this change is helping

May 08 '24 22:05 jainankitk

Can you explicitly issue IndexWriter.commit to create multiple segments and confirm the issue? Just want to ensure that this change is helping

Hey @jainankitk, thanks for suggesting this. In the original code, I added indexWriter.commit(); after 2 addDocument() and I was able to fail the tests accordingly.

Meaning cases where there will 2 segments + deleted document case and optimization will kick in for only 1 of the 2 segments and not both - test will fail with the same CI exception.

So this validates my suspect of multiple segments being the issue.

May 09 '24 20:05 sandeshkr419

@sandeshkr419 - Thanks for confirming the issue. Approved!

May 09 '24 21:05 jainankitk

No its not feasible to add assertion of segment count - I could not find a utility to do so in our codebase. Introducing such utility may not be within the scope of this change.

Since you have an IndexReader, could expectedCollectCount be a Function<IndexReader, Integer>, then for the case where the optimization kicks in, you can pass in a lambda like (r) -> r.leaves().size()? For the non-optimized case, it would be (r) -> docCount, I guess.

May 16 '24 17:05 msfroh

I think I missed where I can check the number of segments using reader.leaves.size()

Since you have an IndexReader, could expectedCollectCount be a Function<IndexReader, Integer>, then for the case where the optimization kicks in, you can pass in a lambda like (r) -> r.leaves().size()? For the non-optimized case, it would be (r) -> docCount, I guess.

The cases which are failing are when one of the segment can use optimized flow (where (r) -> 0) and another segment use the un-optimized flow (r) -> docCount. I think I can assert expectedCollectCount to be number of LiveDocs in segments having deleted documents (unoptimized segments) as segments with 0 deletes will not contribute to in collect() calls.

May 16 '24 18:05 sandeshkr419

@msfroh I see this is merged. Would you recommend reverting this back and have a multiple segment case as we discussed offline.

Tbh, I like it in present state since its much more atomic as the improvements were on segment level and we are forcing the index creation with a single segment. If required, we can have a multiple segment case separately independent of the tests in this PR.

May 17 '24 20:05 sandeshkr419

As per offline discussion with @msfroh - will keep this change and will add a separate test to cover multi-segment scenario.

May 20 '24 17:05 sandeshkr419

OpenSearch OpenSearch copied to clipboard

TermsAggregatorTests flaky test fix

Description

Related Issues

Check List

Codecov Report

OpenSearch
OpenSearch copied to clipboard