solr icon indicating copy to clipboard operation
solr copied to clipboard

SOLR-16824: Back port to branch 9x

Open epugh opened this issue 1 year ago • 23 comments

https://issues.apache.org/jira/browse/SOLR-16824

Description

Backport to branch_9x the code in https://github.com/apache/solr/commit/0df3e63ffbdcde0eb953876a3fb47994d334d8f2. The changes are backwards compatible, however we need to remove the use of OPTION_CREDENTIALS from all the commands.

Solution

Please provide a short description of the approach taken to implement your solution.

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • [ ] I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • [ ] I have created a Jira issue and added the issue ID to my pull request title.
  • [ ] I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • [ ] I have developed this patch against the main branch.
  • [ ] I have run ./gradlew check.
  • [ ] I have added tests for my changes.
  • [ ] I have added documentation for the Reference Guide

epugh avatar Jun 27 '24 16:06 epugh

Humm.. I expected Crave and all the script tests to run on github.....

epugh avatar Jun 27 '24 16:06 epugh

I think you used the wrong JIRA in the title and description. Could you add some content to the title in addition to the backport?

HoustonPutman avatar Jun 27 '24 17:06 HoustonPutman

I think you used the wrong JIRA in the title and description. Could you add some content to the title in addition to the backport?

Argh, that is what I get for not copy-pasting the JIRA ticket number!

epugh avatar Jun 27 '24 17:06 epugh

Okay, this is starting to turn into a bear... I'm working through dealing with the fact that in 9x most of our tools don't have basicauth, and that there are lots of bats tests on main that aren't on branch_9x... We'll see....

epugh avatar Jun 28 '24 10:06 epugh

Perhaps more of the actions should run for PRs against the stable branch, e.g. add stable branch to https://github.com/apache/solr/blob/main/.github/workflows/tests-via-crave.yml#L6

janhoy avatar Jun 28 '24 13:06 janhoy

I merged the changes to github workflows and triggered a new build. The precommit workflow now runs, but I cannot see the other workflows kicking off...

janhoy avatar Jun 28 '24 15:06 janhoy

I'll take a stab at some of the failures...

janhoy avatar Jun 28 '24 15:06 janhoy

Pushed some commits that at least makes Solr compile, and Fixes HealthCheckToolTest.

I propose that we revert most .bats files to testing original (deprecated) options, so we know the old cli API works.

I can see some confusion with /solr url suffix.

janhoy avatar Jun 28 '24 23:06 janhoy

Okay, this is starting to turn into a bear... I'm working through dealing with the fact that in 9x most of our tools don't have basicauth, and that there are lots of bats tests on main that aren't on branch_9x... We'll see....

Wrt basicAuth support with --credentials, I just kept what 9x already has, i.e. no basic auth support for all the tools, so removed OPTION_CREDENTIALS from those. Kept it for AuthTool only, of course, with added support for -credentials.

janhoy avatar Jun 29 '24 10:06 janhoy

Your changes look great, I especially like the helper for deprecated and current option look up. That type of logic is spread about in the work I did and I guess easy to goof!

epugh avatar Jun 29 '24 10:06 epugh

Wrt the create tool vs create_collection and create_core, I tried to keep all three, may still be a bit un-polished.

EDIT Added some fixes to make create_collection and create_core accept new-style args. However, I wonder if we could simplify all this and only keep CreateTool and make bin/solr always call that one...

janhoy avatar Jun 29 '24 10:06 janhoy

I believe I fixed the docker tests (gradlew testDocker), which failed at create_core_exec :)

But when building locally with gradlew dev, and then running bin/solr create_core -c foo, I get a scary error java.lang.NoClassDefFoundError: org/apache/solr/logging/MDCSnapshot

I'm not able to proceed with the hunt more for now, but with PR tests running it should be easier for any committer to pick a test failure and fix it!

janhoy avatar Jun 29 '24 11:06 janhoy

Fixed the test_auth.bats test in commit https://github.com/apache/solr/pull/2540/commits/f187b3e6fd762a9ba047bf6828193b56e83f6f74.

Reason: A mix of missing back-compat options parsing and missing /solr suffix on baseUrl.

janhoy avatar Jun 29 '24 23:06 janhoy

Pushed several changes

  • Finally fixed the /solr url suffix normalization issue I hope
  • CreateTool now used for create, create_core and create_collection.
  • solr create -h usage options printed by SolrCLI (with 120 c width instead of 74)
  • Docker tests once again pass
  • BATS tests for auth and create now pass.

Still many BATS tests fail due to missing cmdline definitions...

janhoy avatar Jul 01 '24 00:07 janhoy

Crave tests locally turned up with only two failed tests, and they definitely look relevant to this PR:

ERROR: The following test(s) have failed:
  - org.apache.solr.cloud.PackageManagerCLITest.testPackageManager (:solr:core)
    Test history: https://ge.apache.org/scans/tests?search.rootProjectNames=solr-root&tests.container=org.apache.solr.cloud.PackageManagerCLITest&tests.test=testPackageManager http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.PackageManagerCLITest.testPackageManager
    Test output: /tmp/src/solr/solr/core/build/test-results/test/outputs/OUTPUT-org.apache.solr.cloud.PackageManagerCLITest.txt
    Reproduce with: ./gradlew :solr:core:test --tests "org.apache.solr.cloud.PackageManagerCLITest.testPackageManager" -Ptests.jvms=96 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" -Ptests.seed=7B8E87042F3A6E2C -Ptests.timeoutSuite=600000! -Ptests.file.encoding=ISO-8859-1

  - org.apache.solr.cloud.SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection (:solr:core)
    Test history: https://ge.apache.org/scans/tests?search.rootProjectNames=solr-root&tests.container=org.apache.solr.cloud.SolrCloudExampleTest&tests.test=testLoadDocsIntoGettingStartedCollection http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection
    Test output: /tmp/src/solr/solr/core/build/test-results/test/outputs/OUTPUT-org.apache.solr.cloud.SolrCloudExampleTest.txt
    Reproduce with: ./gradlew :solr:core:test --tests "org.apache.solr.cloud.SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection" -Ptests.jvms=96 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" -Ptests.seed=7B8E87042F3A6E2C -Ptests.timeoutSuite=600000! -Ptests.file.encoding=ISO-8859-1

janhoy avatar Jul 01 '24 00:07 janhoy

PackageManagerCLITest.testPackageManager was just a case of solrUrl and solr-url.. The other one may be about dealing with the /solr in the path or not...

epugh avatar Jul 01 '24 04:07 epugh

I looked a bit more at SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection to see if I could figure it out... I thought I had some ideas, but no luck.. I believe we are struggling to detect if we are in cloud mode or not, and maybe we are not crafting the right url.. I checked, and the context route being only on main may be part of the problem:

* SOLR-16911: Establish /solr as the only host context supported by Solr, removing legacy ability to change this.  Solr paths will only be either /solr or /api now.  (Eric Pugh)

epugh avatar Jul 01 '24 04:07 epugh

Just fixed the test_delete_collection.bats test (Linux only).

janhoy avatar Jul 01 '24 12:07 janhoy

Getting there... All BATS tests now run, except these five:

not ok 32 solr help flag prints help in 1521ms
# (from function `assert_output' in file /home/runner/work/solr/solr/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash, line 247,
#  in test file test/test_help.bats, line 26)
#   `assert_output --partial 'Usage: solr COMMAND OPTIONS'' failed

not ok 33 solr with no flags prints help in 133ms
# (in test file test/test_help.bats, line 30)
#   `run -1 solr' failed, expected exit code 1, got 0

not ok 37 status help flag prints help in 183ms
# (from function `assert_output' in file /home/runner/work/solr/solr/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash, line 247,
#  in test file test/test_help.bats, line 55)
#   `assert_output --partial 'usage: status'' failed

not ok 38 healthcheck help flag prints help in 196ms
# (from function `assert_output' in file /home/runner/work/solr/solr/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash, line 247,
#  in test file test/test_help.bats, line 63)
#   `assert_output --partial 'usage: healthcheck'' failed
#
# -- output does not contain substring --
# substring (1 lines):
#   usage: healthcheck
# output (14 lines):
#
#   ERROR: Unrecognized or misplaced argument: --help!

not ok 78 status does not expose cli parameters to end user in 278ms
# (from function `assert_output' in file /home/runner/work/solr/solr/.gradle/node/packaging/node_modules/bats-assert/src/assert.bash, line 247,
#  in test file test/test_status.bats, line 45)
#   `assert_output --partial "ERROR: Unrecognized or misplaced argument: -solr!"' failed

janhoy avatar Jul 01 '24 13:07 janhoy

All BATS are green 🦇

Please do a review of my latest couple commits to see if you agree with the decisions.

janhoy avatar Jul 01 '24 14:07 janhoy

I looked a bit more at SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection to see if I could figure it out... I thought I had some ideas, but no luck.. I believe we are struggling to detect if we are in cloud mode or not, and maybe we are not crafting the right url.. I checked, and the context route being only on main may be part of the problem:

* SOLR-16911: Establish /solr as the only host context supported by Solr, removing legacy ability to change this.  Solr paths will only be either /solr or /api now.  (Eric Pugh)

Yea, I gave it a shot as well, but the test (BaseDistributedSearchTestCase) insists in using some dynamic host-context instead of /solr. I tried hardcoding context as in main, but to no avail. This is a tricky one.

janhoy avatar Jul 01 '24 16:07 janhoy

Managed to fix SolrCloudExampleTest by adapt SolrCLI.getSolrClient to use sysProp hostContext from tests if exists. This made the test pass, but now the BATS test for PostTool started failing. Reason was that it used SolrCli.getSolrClient(solrUrl) to get its client for doing commit, with the full http://localhost:8983/solr/mycollection url as base URL. Which no longer works (should it?). So I modified PostTool to instead extract collection name from URL, get a clinent with baseURL and call client.commit(myCollection).

janhoy avatar Jul 01 '24 21:07 janhoy

BUILD SUCCESSFUL in 5m 10s (crave from laptop)

janhoy avatar Jul 01 '24 21:07 janhoy

@janhoy, just looking again at this one, thanks for all the efforts!!!! This looks like it's ready to commit? However before I click "Squash and merge", I wanted to confirm with you!!

epugh avatar Jul 11 '24 14:07 epugh

It’s the best I could do with my available time. I have NOT done a thorough review of all options in all tools, just fixed the ones causing test failures.

The windows script lacks tests so who knows what works or not? Perhaps we are able to do follow up work on that front?

If no one else are willing to do a more thorough manual review, I believe merging this as is, is better than not merging.

And very important: some of my commits may include fixes that should be forward ported to main. We should make a placeholder JIRA to research what that might be.

janhoy avatar Jul 15 '24 11:07 janhoy

Good thoughts here @janhoy... I will leave this open for another day or so, and then look to commit it. I will put out an appeal to the dev mailing list when 9.7 release process starts asking for additional testing of bin/solr.cmd...

epugh avatar Jul 15 '24 15:07 epugh

I filed SOLR-17371 for forward-porting. Also synced PR branch with branch_9x.

janhoy avatar Jul 16 '24 20:07 janhoy

This fails:

./gradlew :solr:core:test --tests "org.apache.solr.cloud.SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection" -Ptests.jvms=96 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1 -XX:ReservedCodeCacheSize=120m" -Ptests.seed=44FFA474CD768986 -Ptests.timeoutSuite=600000! -Ptests.file.encoding=UTF-8

however this passes:

./gradlew :solr:core:test --tests "org.apache.solr.cloud.SolrCloudExampleTest.testLoadDocsIntoGettingStartedCollection"

epugh avatar Jul 17 '24 08:07 epugh

okay, it fails on that tests.seed value. How can I step through the test in intellij with that seed value?

epugh avatar Jul 17 '24 08:07 epugh

looks like logic to identify a nodename assumes one _ and we have two _ chars!

epugh avatar Jul 17 '24 08:07 epugh