solr-operator
solr-operator copied to clipboard
Getting unauthorized requests for cluster/replicas/balance in v0.8.0
Hi Team,
We've recently changed the operator version from v0.7.0 to v0.8.0 for our SolrCloud cluster (9.4).
It seems that the new version requests the status of an async task named balance-replicas-ScaleUp every 60 seconds. Once it doesn't find one running, it sends a "replica/balance" request to the target Solr, therefore we're getting the following error for every attempt:
Recieved bad response code of 403 from solr with response: {
\"servlet\":\"default\",
\"message\":\"Unauthorized request, Response code: 403\",
\"url\":\"/solr/____v2/cluster/replicas/balance\",
\"status\":\"403\"
}
I found that the relevant metadata for this use case is written into the annotation field of the statefulsets object: "solr.apache.org/clusterOpsLock": "{\"operation\":\"BalanceReplicas\",\"lastStartTime\":\"2023-11-09T14:28:18Z\",\"metadata\":\"ScaleUp\"}"
We're using the default security.json credentials and I'm not certain if there's anything to be changed in our settings for this matter.
Thanks!
Not sure, but could be a similar/related issue. After upgrade from v0.7.0 to v0.8.0 for our SolrCloud cluster (9.0), we're getting a 404 from Solr API
Error returned from Solr API: 404. no core retrieved for core name: null. Path : /cluster/replicas/balance
2023-12-22T06:37:54Z INFO Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff. For more details, see: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "01215b5c-6555-404c-85f5-2a5246ef41cb"}
2023-12-22T06:37:54Z ERROR Reconciler error {"controller": "solrcloud", "controllerGroup": "solr.apache.org", "controllerKind": "SolrCloud", "SolrCloud": {"name":"solr","namespace":"solr"}, "namespace": "solr", "name": "solr", "reconcileID": "01215b5c-6555-404c-85f5-2a5246ef41cb", "error": "Error returned from Solr API: 404. no core retrieved for core name: null. Path : /cluster/replicas/balance"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227
Same error on Solr side:
2023-12-22 06:40:54.937 ERROR (qtp1942828992-2283) [] o.a.s.a.V2HttpCall >> path: '/cluster/replicas/balance' 2023-12-22 06:40:54.937 ERROR (qtp1942828992-2283) [] o.a.s.a.V2HttpCall Error in init() => org.apache.solr.common.SolrException: no core retrieved for core name: null. Path : /cluster/replicas/balance at org.apache.solr.api.V2HttpCall.init(V2HttpCall.java:155) org.apache.solr.common.SolrException: no core retrieved for core name: null. Path : /cluster/replicas/balance
why is the operator hitting this path?
Thanks for any help.
@mmoscher , that is expected. After the completion of an ephemeral rolling restart, the Solr Operator now tries to balance the cluster. If Solr doesn't support that command (and 9.0 does not), it just completes the operation. But the only way that the Solr Operator can know if it's supported is to try running it.
@ozlerhakan I think that is something that we missed. I'll try to add a PR for that soon.
Is there a workaround for this? We're getting the same error after starting up a brand new cluster.
As a workaround, you can manually add /____v2/cluster/replicas/balance in the security screen to the k8s-oper and admin users on a new cluster.
However just a warning that rebalancing doesn't seem to work very well so you may just want to leave it off... it seems to just create a whole bunch of replicas then time out, then leaves a bunch of replicas on the new nodes without deleting the ones on the old node. After failing, it runs rebalance again, then instantly declares it was a "success" even though the replicas never got moved properly and now you have more replicas than you started with.
2025-03-15 00:59:28.579 INFO (OverseerThreadFactory-120-thread-2) [c: s: r: x: t:] o.a.s.c.a.c.ReplicaMigrationUtils Timed out waiting for 1 leader replicas to recover
2025-03-15 00:59:28.579 INFO (OverseerThreadFactory-120-thread-2) [c: s: r: x: t:] o.a.s.c.a.c.ReplicaMigrationUtils Failed to create some replicas. Cleaning up all newly created replicas.
org.apache.solr.common.SolrException: collection,shard,replica are required params
" at org.apache.solr.cloud.api.collections.CollectionHandlingUtils.checkRequired(CollectionHandlingUtils.java:196) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteReplica(DeleteReplicaCmd.java:86) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:130) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]"
2025-03-15 00:59:28.579 ERROR (OverseerThreadFactory-120-thread-2) [c: s: r: x: t:] o.a.s.c.a.c.CollectionHandlingUtils Operation balance_replicas failed => org.apache.solr.common.SolrException: collection,shard,replica are required params
" at org.apache.solr.cloud.api.collections.CollectionHandlingUtils.checkRequired(CollectionHandlingUtils.java:196) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.api.collections.DeleteReplicaCmd.deleteReplica(DeleteReplicaCmd.java:86) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.api.collections.ReplicaMigrationUtils.migrateReplicas(ReplicaMigrationUtils.java:217) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.api.collections.BalanceReplicasCmd.call(BalanceReplicasCmd.java:81) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:130) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:363) ~[solr-solrj-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]"
2025-03-15 00:59:28.579 WARN (OverseerThreadFactory-120-thread-2) [c: s: r: x: t:] o.a.s.c.a.c.ReplicaMigrationUtils Error deleting replica => org.apache.solr.common.SolrException: collection,shard,replica are required params
" at org.apache.solr.cloud.api.collections.CollectionHandlingUtils.checkRequired(CollectionHandlingUtils.java:196)"
" at org.apache.solr.cloud.api.collections.ReplicaMigrationUtils.migrateReplicas(ReplicaMigrationUtils.java:217) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.api.collections.BalanceReplicasCmd.call(BalanceReplicasCmd.java:81) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.api.collections.CollApiCmds$TraceAwareCommand.call(CollApiCmds.java:225) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:564) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:363) ~[solr-solrj-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]"
" at java.base/java.lang.Thread.run(Unknown Source) [?:?]"
" at org.apache.solr.cloud.api.collections.CollectionHandlingUtils.checkRequired(CollectionHandlingUtils.java:196)"
org.apache.solr.common.SolrException: collection,shard,replica are required params
" at org.apache.solr.cloud.api.collections.CollApiCmds$TraceAwareCommand.call(CollApiCmds.java:225) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:564) ~[solr-core-9.6.1.jar:9.6.1 d7f7166567f52f1b31e3315b0188e11f2c4c9b60 - houston - 2024-05-23 13:50:22]"
" at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]"
" at java.base/java.lang.Thread.run(Unknown Source) [?:?]"