solr icon indicating copy to clipboard operation
solr copied to clipboard

SOLR-16957: Test user managed cluster with a twist!

Open epugh opened this issue 2 years ago • 14 comments

https://issues.apache.org/jira/browse/SOLR-16957

Description

BATS test for user managed index replication. This is a End 2 End test, not a unit test for Bash scripts.

Solution

Fire up three independent Solrs, set up replication via apis, trigger it, and see what happens.

We demonstrate starting up three independent Solr nodes in the Leader/Repeater/Follower pattern. Then we create three seperate 'techproducts' collections, uploading the same configset three seperate times to demonstrate that there is no interconnection or shard config between them. We then index some XML data on the Leader, and then check that it flows through the Repeater to the Follower. This is repeated for some more documents. Lastly, we shutdown the Repeater and demonstrate that the Follower still has all of it's documents available for querying. We delete the data on the Leader, and then subsequantly bring back up the Repeater. The Repeater perseves all fo the configuration that was done during the setup process after restarting, and immediatley copies over the now empty 'techproducts' index and we then see the Follower picks up that empty collection as well.

epugh avatar Aug 31 '23 14:08 epugh

Cool. I don't see the point in testing replication between two isolated SolrCloud nodes though, is that even supported? Are you thinking about some kind of usecase where you pull the index from a cloud cluster to an outside cluster for hot standby purposes?

I'm thinking that if having two standalone Solr's each with their own embedded ZK works, well, we just eliminated the need for traditional "standalone" Solr. While having an easy upgrade path for all the folks who want to continue to have user managed index replication. We just change "bin/solr start" to do what today you enabled with "bin/solr start -c" and everything continues to work. Except now, everywhere we make a SolrCloud versus standalone decision, we only have SolrCloud. And all those tickets about "make X work in standalone solr" are now obsoleted...

epugh avatar Aug 31 '23 17:08 epugh

Fair enough, but that sounds like a new JIRA issue, not part of this test improvement?

I'm sceptical to plan a cluster with 6 nodes, each with its own source of truth in Zookeeper. How would you update the schema of your collection? In standalone, the schema.xml file is replicated to the replica. But that will not work here since Solr reads its schema from each local ZK. So then you need to do solr zk upconfig six times instead of one. Managing split source of truth will be a nightmare. What if you want to backup your collection, do you then do six BACKUP calls? etc etc.

I'm more in favor of improving solrcloud with replica modes to the point where there are no benefits of running standalone anymore.

janhoy avatar Aug 31 '23 17:08 janhoy

You are quite right about probably conflating this with my other experiment to see how it works. if you are running a cluster with six nodes, then you probably SHOULD be using Solrcloud, and proper ZK.

I'll split this up, and we do need to figure out how to come up with a path to eliminate the solrcloud versus standalone divide...

epugh avatar Aug 31 '23 18:08 epugh

@janhoy just to clarify, if I keep the NON zookeeper end 2 end test for replication, do you see that as valuable and worth merging? I'll split the zookeeper version of the test out into it's own PR... I'm interested in playing with it a bit more...

epugh avatar Sep 09 '23 11:09 epugh

Yea, not sure how much value it gives in addition to the replication handler tests in then test suite though? Can you comment on that?

janhoy avatar Sep 09 '23 18:09 janhoy

i think we're going to see a lot more change in this area, adding basic auth, the potential changes aorund ZK.. so thinking this helps build confidence we didn't break anything.... Does that seem like enough upside?

epugh avatar Sep 25 '23 13:09 epugh

Not very convinced still 😉

janhoy avatar Sep 25 '23 15:09 janhoy

;-) Okay. That's fair. I'll close this, and if we see value int he future we can reopen.

epugh avatar Sep 25 '23 15:09 epugh

i hate having PR's that just hang out open for years in github ;-)

epugh avatar Sep 25 '23 15:09 epugh

@gerlowskija here is a proof of concept of user managed cluster based on our conversation last week!

epugh avatar Feb 11 '24 19:02 epugh

change SOLR_PORT to LEADER_PORT, REPEATER_PORT, FOLLOWER_PORT.... Also, could look up indexversion on leader, and then wait for it on repeater instead of sleep...

epugh avatar Feb 19 '24 20:02 epugh

This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!

github-actions[bot] avatar Apr 24 '24 00:04 github-actions[bot]

This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.

github-actions[bot] avatar Oct 07 '24 00:10 github-actions[bot]

Still working on how/when this type of integration test becomes part of Solr!

epugh avatar Oct 07 '24 10:10 epugh

I am back on the path of wanting to get this in. In SOLR-17492 (and the PR https://github.com/apache/solr/pull/2783) we talk about how to run Solr. However, how do we actually KNOW that it works? We see a lot of bugs that come from specific combinations of auths, cluster shape, features etc. While there may be more robust ways of supporting testing these combinations, bats is one way that we have here today. Maybe we have a serpate directory of them that get run less frequently, but that validate the various deployment scenarios?

epugh avatar Nov 01 '24 16:11 epugh