OpenSearch icon indicating copy to clipboard operation
OpenSearch copied to clipboard

Tika tests

Open finnegancarroll opened this issue 9 months ago • 1 comments

Description

Enhance tika document parsing tests by validating output against current version.

Related Issues

Resolves "Improve the validation on TikaDocTests #12887"

Check List

  • [x] New functionality includes testing.
    • [x] All tests pass
  • [ ] New functionality has been documented.
    • [ ] New functionality has javadoc added
  • [ ] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • [ ] Commits are signed per the DCO using --signoff
  • [ ] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • [ ] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

finnegancarroll avatar May 09 '24 22:05 finnegancarroll

:x: Gradle check result for 7ee60451f19403c5b80b706b86dcdb38c7bfec31: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 09 '24 22:05 github-actions[bot]

:x: Gradle check result for 810f3a90f53f83fa0eadeebfe053fb083116e90a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 10 '24 18:05 github-actions[bot]

Gradle check failing due to unrelated flaky test: #11979

finnegancarroll avatar May 10 '24 19:05 finnegancarroll

:x: Gradle check result for 810f3a90f53f83fa0eadeebfe053fb083116e90a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 10 '24 19:05 github-actions[bot]

:x: Gradle check result for 3bd9469706a733fb51fbec64ee3912c7d5af4884: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 10 '24 23:05 github-actions[bot]

:x: Gradle check result for 2dc3fcf87fd089a6e893f5ff4e413ca8488707fe: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 10 '24 23:05 github-actions[bot]

:x: Gradle check result for ddd4b5659f868aa4cf5852af47cb8d1061a217bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 11 '24 00:05 github-actions[bot]

Known flaky test: #10006

finnegancarroll avatar May 13 '24 16:05 finnegancarroll

:x: Gradle check result for ddd4b5659f868aa4cf5852af47cb8d1061a217bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 13 '24 18:05 github-actions[bot]

Known flaky test: #13476

finnegancarroll avatar May 13 '24 18:05 finnegancarroll

  1. Is there any advantage of having those zipped in the repo other than having to unzip them?
  2. Keeping a checksum map in code feels a little odd. I don't feel strongly about it, but maybe a .checksum file would be a bit cleaner.

dblock avatar May 13 '24 22:05 dblock

:x: Gradle check result for 7e3ec2d4d0594d11ff89253851db5c0b5dece99d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 14 '24 00:05 github-actions[bot]

Hi @dblock and @reta thanks for taking a look! I've moved the checksum map to a separate file.

  1. Is there any advantage of having those zipped in the repo other than having to unzip them?

I believe the intent here is to hide from the linter. testEXCEL.xls for example is not UTF-8 and fails the precommit "forbiddenPatterns" task.

finnegancarroll avatar May 14 '24 17:05 finnegancarroll

:x: Gradle check result for 3fcc4bcc3fe0a091d777d37dda557273742b6cb2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 14 '24 18:05 github-actions[bot]

Known flaky test: #13600

finnegancarroll avatar May 14 '24 19:05 finnegancarroll

:x: Gradle check result for 3fcc4bcc3fe0a091d777d37dda557273742b6cb2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 15 '24 12:05 github-actions[bot]

:x: Gradle check result for 9ae651e74143c679dff57a610e9ef7ec802647cb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 15 '24 16:05 github-actions[bot]

:x: Gradle check result for ef62853eb0021108b4ad043d3f704e487c26e0fd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] avatar May 15 '24 17:05 github-actions[bot]

❌ Gradle check result for ef62853: FAILURE

Needs https://github.com/opensearch-project/OpenSearch/pull/13673

reta avatar May 15 '24 17:05 reta

@finnegancarroll we sadly have pretty flaky test suite now, fe this combination fails for me:

./gradlew ':plugins:ingest-attachment:test' --tests "org.opensearch.ingest.attachment.TikaDocTests.testParseSamples" -Dtests.seed=98D53194946B5C85 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=hi-IN -Dtests.timezone=Asia/Istanbul

Please let ./gradlew :plugins:ingest-attachment:check run for a couple of hours, to make sure the test suite is stable, thank you.

reta avatar May 15 '24 18:05 reta

:white_check_mark: Gradle check result for f0cc85434c55fd854a867177e2c6f957b6137ca5: SUCCESS

github-actions[bot] avatar May 16 '24 00:05 github-actions[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 71.56%. Comparing base (b15cb0c) to head (f0cc854). Report is 286 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13618      +/-   ##
============================================
+ Coverage     71.42%   71.56%   +0.14%     
- Complexity    59978    61201    +1223     
============================================
  Files          4985     5059      +74     
  Lines        282275   287522    +5247     
  Branches      40946    41646     +700     
============================================
+ Hits         201603   205759    +4156     
- Misses        63999    64777     +778     
- Partials      16673    16986     +313     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar May 16 '24 00:05 codecov[bot]

Removed strict checksum validation for some additional files with locale dependent parsing. Ran for a couple hours and with all available locales in Locale.getAvailableLocales() to ensure no flaky cases remain.

finnegancarroll avatar May 16 '24 02:05 finnegancarroll

This looks better than what we have, @reta any objections?

It really does, no objections @dblock , just double checking no flakyness is going to be introduced

reta avatar May 16 '24 13:05 reta

This looks better than what we have, @reta any objections?

It really does, no objections @dblock , just double checking no flakyness is going to be introduced

Thanks. All yours to merge.

dblock avatar May 16 '24 14:05 dblock