solr
solr copied to clipboard
SOLR-15342: Separate out a SolrJ-Zookeeper module
https://issues.apache.org/jira/browse/SOLR-15342
I open this PR in draft to share the evolution of the work. I closed the old PR since it has been open for a while, there have been quite a few changes since then in relation to ZooKeeper and some of the work has been contributed in a separate PR. I resume the work done in the latter in this one
Thanks for getting this started again @haythemkh! As a first-pass I suggest that we put the current solrj functionality into solrj/core. Once that is committed, we can then do what this PR does, but split it out into solrj/zookeeper instead of solrj-zookeeper. I do not think we want to have all of the solrj libraries as folders under solr/ directly.
I will get a draft of this up sometime today.
Once that is committed, we can then do what this PR does
@HoustonPutman A change like that risks Haythem having to start over an additional time. I am encouraging him to get this done quickly so avoid it getting stale as what happened last time (which was depressing). If we want to have solrj/core then I propose that happen in this issue at the end, or in a follow-on PR.
Ok I will do it following this getting merged
oops, was trying to be helpful resolving the merge conflict but forgot to compile locally before pushing, sorry. will try to fix, or else commit reversion logically should also be possible? sorry for the noise.
oops, was trying to be helpful resolving the merge conflict but forgot to compile locally before pushing, sorry. will try to fix, or else commit reversion logically should also be possible? sorry for the noise.
added 3 commits to make it compile a little more but not fully. not sure if it was compiling before actually. again, sorry for the noise.
@haythemkh I know you are using the Google Java Format IntelliJ plugin but not the IntelliJ code style that accompanies it, which ensures that the import wildcards are configured correctly (the plugin can't do that part). See https://github.com/google/google-java-format#intellij-android-studio-and-other-jetbrains-ides
FYI @epugh it would be cool to mention this somewhere for contributor onboarding
I would love to have your feedback on what we want to do with the nocommits. These include
- Copy of ZkStateReader constants to solrj classes (Utils, PerReplicaStates, ClusterState, DocCollection, Slice, Replica, ZkCoreNodeProps, CloudSolrClient, BaseHttpClusterStateProvider, CollectionAdminRequest, CoreAdminRequest) as ZkStateReader is moving to solrj-zookeeper
- Use raw "http" instead of going through #getClusterProperty in DatabaseMetaDataImpl
Is there any reason for PRS not to be in the zk module? I don't think it would be used if we aren't communicating with ZK.
Copy of ZkStateReader constants to solrj classes
Would it make more sense to have one ClusterStateProps class to hold these? We could also split it into ClusterProps, CollectionProps, etc, but that might be overkill.
PerReplicaStates is referenced in ClusterState, DocCollection, and Replica. Based on my very limited understanding of PRS, it's a part of the state of a collection no matter how it's retrieved (via ZK or via SolrClient based ClusterStateProvider). Am I right @noblepaul @chatman ?
I do like your suggestion of those 3 "Props" classes for constants. How do we get there from here? Directly, even if it changes import statement in lots more classes (I would assume)? This is the easiest path. Or do we leave that for another JIRA issue (soon) and for now just remove the nocommits about this to make precommit happy? Also relatively easy but leaves some WIP. Or go part way here & now -- add the new classes & constants but only use them in solrj-zookeeper right now and then migrate the rest in another issue? More work.
I would like to point out that this new separation did move some tests to solrj-zookeeper but it's only four -- that's it. Many solrj tests indirectly use ZK and I suppose that's okay. There is a compile time dependency on ZK from solrj tests that is unfortunate but couldn't be avoided in a few cases (without investing additional work):
- StreamingTest.streamLocalTests calls
zkStateReader.forceUpdateCollection - TestCloudSolrClientConnections calls cluster.getZkClient()
- org.apache.solr.client.solrj.impl.HttpClusterStateSSLTest#testHttpClusterStateWithSSL calls cluster.getZkClient() Perhaps these might even move but on the other hand, it's not some big deal to add an explicit dependency on solrj tests to ZK when ZK is definitely already there at runtime via the solrj-zookeeper dependency.
There is a weird dependency check problem that I can't figure out:
Execution failed for task ':solr:solrj-zookeeper:analyzeTestClassesDependencies'.
> Dependency analysis found issues.
usedUndeclaredArtifacts
- org.apache.solr:solrj-zookeeper:10.0.0-SNAPSHOT@
So solrj-zookeeper's tests depend on solrj-zookeeper. Well of course it does ;-)
A big obstacle I'm seeing is split packages between solrj & solrj-zookeeper with respect to javadocs. For example the javadocs for solr-core will now contain broken links to most solrj-zookeeper classes because the javadoc tool can't figure this out (kind of a long-standing deficiency).
Personally, I'm not very sympathetic to the needs of continuing to maintain a javadoc site. I don't use them, I doubt others do, and moreover for open-source, really the source is what we want to see because it's better / more. That's me, anyway. I don't think maintaining them is more important than other goals like modularization, and thus I'm willing for there to be some degradation there.
Options:
- In our javadocs, stop linking out to solrj or solrj-zookeeper because they cannot be differentiated by the standard javadoc tool.
- Accept broken links; stop checking for them within javadoc folders.
- Fix it (not something I'm willing to invest in)
Of course in main/10.0 we can remove the split packages but that's a back-combat break that isn't okay for 9.x.
CC @uschindler -- I'm curious what you think
- In our javadocs, stop linking out to solrj or solrj-zookeeper because they cannot be differentiated by the standard javadoc tool.
This seems acceptable for 9.x, and in main branch we could change package names and re-introduce some links if needed?
@sigram There is an abstraction SolrCloudManager and some related classes (DistributedQueue, DistribStateManager, NodeStateProvider) that you introduced once upon a time that don't seem to have anything to do with SolrJ yet you put them there. I believe these could go to Solr Core. Most need to at least go to Solrj-zookeeper because of DistributedQueue's reference of ZK classes.
For constants related to the collection's state, I put it in an inner interface DocCollection.CollectionSProps. The "S" there is for State. If that's removed, then it's possibly confused with CollectionProperties. It's mostly an internal set of constants (it's not parameters) so I think it's okay it it doesn't roll off the tongue. Likewise I put constants for the Replica in Replica.ReplicaSProps. I'm not married to these choices.
There is a CollectionAdminParams class but it was missing quite a number of parameters that CollectionAdminRequest needed to specify so I moved them there.
There were a handful of other constants then I mostly made private and didn't fret over possible redundancy with ZkStateReader.
Overall I like having these props or params as constants in suitable locations, especially when we don't static import these constants. It adds clarity to the spots where they are referred to as being related to parameters or properties or whatever it is.
The only thing that remains is the javadoc split package concern. It should only be a handful of lines of code or so in render-javadoc.adoc to special case them to not inter-link.
At the risk of bikeshedding.... How about ReplicaStateProps.LEADER instead of ReplicaSProps.LEADER since we don't have an existing concept of SProps throughout our codebase?
Other than a CHANGES.txt entry, I think this PR is ready.
Looks like the content about CloudSolrClient in the ref guide (https://solr.apache.org/guide/solr/latest/deployment-guide/solrj.html#building-and-running-solrj-applications) refers to the use of the solr url for confiugration I wonder if we should tweak the text thgouh to stress that using ZK is a vulnerability? Similar to what you put in the upgrade notes?
Also, do we need to mention the need for solrj-zookeeper dependency for CloudSolrClient users?
Really looking forward to this!
Docs look good!
🚢 ???
Planned commit message after the first line:
• CloudSolrClient builders use reflection to instantiate ZkClientClusterStateProvider when needed
• CollectionAdminParams constants
• CollectionStateProps constants
• SliceStateProps constants
• ReplicaStateProps constants
• SolrJ streaming expressions: don’t depend on ZK anymore
• javadocs: temporarily unlink to solrj modules due to split package issue
• PerReplicaStates.fetch -> PerReplicaStatesFetcher
• ZkStateReader.getCollectionPath -> DocCollection.getCollectionPath
Some other code changes seemed too obscure to mention.
I think it's done and I've addressed comments. I would love to see this merged soon so as to unblock other efforts that touch lots of files. If nobody has further review comments, I'll merge it Thursday.