OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Fix bug "synonym_graph filter fails with word_delimiter_graph when using whitespace or classic tokenizer in synonym_analyzer"

Open laminelam opened this issue 3 months ago • 10 comments

This PR fixes the "synonym_graph filter fails with word_delimiter_graph when using whitespace or classic tokenizer in synonym_analyzer" bug

Investigated the issue and looks like there are 2 causes:

  • When building the analyzers, if one fails for some reason, it throws an exception and the process stops. So the customSynonymAnalyzer does not get instantiated.
  • On the other hand, if the customSynonymAnalyzer depends on another one that hasn't been built (and registred) yet the process fails too.

analysisRegistry.getAnalyzer(synonymAnalyzerName);

This is not enough because it only looks into the built in and pre built in analyzers. The one from settings are not there.

The solution is two-fold:

  • Fail safe instead of fail fast when building the analyzers.
  • Build the depending analyzers first.

Fail safe instead of fail fast when building the analyzers Right now, if an analyzer fails for some reason the whole building process fails with an exception.

Build the depending analyzers first: Synonym custom analyzers may depend on another analyzer that has to be built first. The PR adds a logic to:

  • add option "order" attribute that defines precedence order between analyzers
  • add 'analyzersBuiltSoFar' to getChainAwareTokenFilterFactory to pass the already built analyzers needed by the one being built

Resolves #18037

Check List

  • [x] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

laminelam avatar Sep 07 '25 17:09 laminelam

:x: Gradle check result for 5c8fbe6fa2fae45fe64bfcf1cedc2e3aa4915a47: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 07 '25 18:09 github-actions[bot]

:x: Gradle check result for 267f48eb9f435c59b4047b23621f11ea9e519310: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 07 '25 22:09 github-actions[bot]

Thanks @laminelam looks like similar issue from past https://github.com/opensearch-project/OpenSearch/issues/16263#issuecomment-2439698469 ?

prudhvigodithi avatar Sep 22 '25 01:09 prudhvigodithi

:x: Gradle check result for 48cc847f725e448f486db8398a402ab2cfe5a4c5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Sep 22 '25 01:09 github-actions[bot]

Thanks @laminelam looks like similar issue from past #16263 (comment) ?

Yes I think they are related. This fix should handle both situations

laminelam avatar Oct 13 '25 15:10 laminelam

:x: Gradle check result for 49bb330a3a080ff4cde2d7fdc0a6f0a2284308d1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Oct 27 '25 15:10 github-actions[bot]

@prudhvigodithi Friendly ping to review this code, thanks!

@laminelam Can you fix up the conflicts by rebasing, the changelog failure by adding an appropriate entry to the changelog, and the DCO failure by ensuring all commits are signed?

andrross avatar Nov 21 '25 22:11 andrross

:x: Gradle check result for 7f3e8fd9e9fc4bddf6fe7884e58550901056da7b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar Nov 22 '25 16:11 github-actions[bot]

Added a small comment on the original issue https://github.com/opensearch-project/OpenSearch/issues/18037#issuecomment-3567040409.

I did look at the PR at high level and added some comments, will take a look at it again. Thanks @laminelam.

prudhvigodithi avatar Nov 22 '25 22:11 prudhvigodithi