solr
solr copied to clipboard
SOLR-16957: Test user managed cluster with a twist!
https://issues.apache.org/jira/browse/SOLR-16957
Description
BATS test for user managed index replication. This is a End 2 End test, not a unit test for Bash scripts.
Solution
Fire up three independent Solrs, set up replication via apis, trigger it, and see what happens.
We demonstrate starting up three independent Solr nodes in the Leader/Repeater/Follower pattern. Then we create three seperate 'techproducts' collections, uploading the same configset three seperate times to demonstrate that there is no interconnection or shard config between them. We then index some XML data on the Leader, and then check that it flows through the Repeater to the Follower. This is repeated for some more documents. Lastly, we shutdown the Repeater and demonstrate that the Follower still has all of it's documents available for querying. We delete the data on the Leader, and then subsequantly bring back up the Repeater. The Repeater perseves all fo the configuration that was done during the setup process after restarting, and immediatley copies over the now empty 'techproducts' index and we then see the Follower picks up that empty collection as well.
Cool. I don't see the point in testing replication between two isolated SolrCloud nodes though, is that even supported? Are you thinking about some kind of usecase where you pull the index from a cloud cluster to an outside cluster for hot standby purposes?
I'm thinking that if having two standalone Solr's each with their own embedded ZK works, well, we just eliminated the need for traditional "standalone" Solr. While having an easy upgrade path for all the folks who want to continue to have user managed index replication. We just change "bin/solr start" to do what today you enabled with "bin/solr start -c" and everything continues to work. Except now, everywhere we make a SolrCloud versus standalone decision, we only have SolrCloud. And all those tickets about "make X work in standalone solr" are now obsoleted...
Fair enough, but that sounds like a new JIRA issue, not part of this test improvement?
I'm sceptical to plan a cluster with 6 nodes, each with its own source of truth in Zookeeper. How would you update the schema of your collection? In standalone, the schema.xml file is replicated to the replica. But that will not work here since Solr reads its schema from each local ZK. So then you need to do solr zk upconfig six times instead of one. Managing split source of truth will be a nightmare. What if you want to backup your collection, do you then do six BACKUP calls? etc etc.
I'm more in favor of improving solrcloud with replica modes to the point where there are no benefits of running standalone anymore.
You are quite right about probably conflating this with my other experiment to see how it works. if you are running a cluster with six nodes, then you probably SHOULD be using Solrcloud, and proper ZK.
I'll split this up, and we do need to figure out how to come up with a path to eliminate the solrcloud versus standalone divide...
@janhoy just to clarify, if I keep the NON zookeeper end 2 end test for replication, do you see that as valuable and worth merging? I'll split the zookeeper version of the test out into it's own PR... I'm interested in playing with it a bit more...
Yea, not sure how much value it gives in addition to the replication handler tests in then test suite though? Can you comment on that?
i think we're going to see a lot more change in this area, adding basic auth, the potential changes aorund ZK.. so thinking this helps build confidence we didn't break anything.... Does that seem like enough upside?
Not very convinced still 😉
;-) Okay. That's fair. I'll close this, and if we see value int he future we can reopen.
i hate having PR's that just hang out open for years in github ;-)
@gerlowskija here is a proof of concept of user managed cluster based on our conversation last week!
change SOLR_PORT to LEADER_PORT, REPEATER_PORT, FOLLOWER_PORT.... Also, could look up indexversion on leader, and then wait for it on repeater instead of sleep...
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the [email protected] mailing list. Thank you for your contribution!
This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again.
Still working on how/when this type of integration test becomes part of Solr!
I am back on the path of wanting to get this in. In SOLR-17492 (and the PR https://github.com/apache/solr/pull/2783) we talk about how to run Solr. However, how do we actually KNOW that it works? We see a lot of bugs that come from specific combinations of auths, cluster shape, features etc. While there may be more robust ways of supporting testing these combinations, bats is one way that we have here today. Maybe we have a serpate directory of them that get run less frequently, but that validate the various deployment scenarios?