solr SOLR-15342: Separate out a SolrJ-Zookeeper module

trafficstars

https://issues.apache.org/jira/browse/SOLR-15342

I open this PR in draft to share the evolution of the work. I closed the old PR since it has been open for a while, there have been quite a few changes since then in relation to ZooKeeper and some of the work has been contributed in a separate PR. I resume the work done in the latter in this one

Jul 15 '22 15:07 heythm

Thanks for getting this started again @haythemkh! As a first-pass I suggest that we put the current solrj functionality into solrj/core. Once that is committed, we can then do what this PR does, but split it out into solrj/zookeeper instead of solrj-zookeeper. I do not think we want to have all of the solrj libraries as folders under solr/ directly.

I will get a draft of this up sometime today.

Jul 15 '22 16:07 HoustonPutman

Once that is committed, we can then do what this PR does

@HoustonPutman A change like that risks Haythem having to start over an additional time. I am encouraging him to get this done quickly so avoid it getting stale as what happened last time (which was depressing). If we want to have solrj/core then I propose that happen in this issue at the end, or in a follow-on PR.

Jul 15 '22 18:07 dsmiley

Ok I will do it following this getting merged

Jul 15 '22 18:07 HoustonPutman

oops, was trying to be helpful resolving the merge conflict but forgot to compile locally before pushing, sorry. will try to fix, or else commit reversion logically should also be possible? sorry for the noise.

Jul 29 '22 16:07 cpoerschke

oops, was trying to be helpful resolving the merge conflict but forgot to compile locally before pushing, sorry. will try to fix, or else commit reversion logically should also be possible? sorry for the noise.

added 3 commits to make it compile a little more but not fully. not sure if it was compiling before actually. again, sorry for the noise.

Jul 29 '22 17:07 cpoerschke

@haythemkh I know you are using the Google Java Format IntelliJ plugin but not the IntelliJ code style that accompanies it, which ensures that the import wildcards are configured correctly (the plugin can't do that part). See https://github.com/google/google-java-format#intellij-android-studio-and-other-jetbrains-ides

FYI @epugh it would be cool to mention this somewhere for contributor onboarding

Jul 30 '22 05:07 dsmiley

I would love to have your feedback on what we want to do with the nocommits. These include

Copy of ZkStateReader constants to solrj classes (Utils, PerReplicaStates, ClusterState, DocCollection, Slice, Replica, ZkCoreNodeProps, CloudSolrClient, BaseHttpClusterStateProvider, CollectionAdminRequest, CoreAdminRequest) as ZkStateReader is moving to solrj-zookeeper
Use raw "http" instead of going through #getClusterProperty in DatabaseMetaDataImpl

Aug 01 '22 15:08 heythm

PerReplicaStates

Is there any reason for PRS not to be in the zk module? I don't think it would be used if we aren't communicating with ZK.

Copy of ZkStateReader constants to solrj classes

Would it make more sense to have one ClusterStateProps class to hold these? We could also split it into ClusterProps, CollectionProps, etc, but that might be overkill.

Aug 05 '22 16:08 HoustonPutman

PerReplicaStates is referenced in ClusterState, DocCollection, and Replica. Based on my very limited understanding of PRS, it's a part of the state of a collection no matter how it's retrieved (via ZK or via SolrClient based ClusterStateProvider). Am I right @noblepaul @chatman ?

I do like your suggestion of those 3 "Props" classes for constants. How do we get there from here? Directly, even if it changes import statement in lots more classes (I would assume)? This is the easiest path. Or do we leave that for another JIRA issue (soon) and for now just remove the nocommits about this to make precommit happy? Also relatively easy but leaves some WIP. Or go part way here & now -- add the new classes & constants but only use them in solrj-zookeeper right now and then migrate the rest in another issue? More work.

I would like to point out that this new separation did move some tests to solrj-zookeeper but it's only four -- that's it. Many solrj tests indirectly use ZK and I suppose that's okay. There is a compile time dependency on ZK from solrj tests that is unfortunate but couldn't be avoided in a few cases (without investing additional work):

StreamingTest.streamLocalTests calls zkStateReader.forceUpdateCollection
TestCloudSolrClientConnections calls cluster.getZkClient()
org.apache.solr.client.solrj.impl.HttpClusterStateSSLTest#testHttpClusterStateWithSSL calls cluster.getZkClient() Perhaps these might even move but on the other hand, it's not some big deal to add an explicit dependency on solrj tests to ZK when ZK is definitely already there at runtime via the solrj-zookeeper dependency.

There is a weird dependency check problem that I can't figure out:

Execution failed for task ':solr:solrj-zookeeper:analyzeTestClassesDependencies'.
> Dependency analysis found issues.
  usedUndeclaredArtifacts
   - org.apache.solr:solrj-zookeeper:10.0.0-SNAPSHOT@

So solrj-zookeeper's tests depend on solrj-zookeeper. Well of course it does ;-)

Aug 05 '22 21:08 dsmiley

A big obstacle I'm seeing is split packages between solrj & solrj-zookeeper with respect to javadocs. For example the javadocs for solr-core will now contain broken links to most solrj-zookeeper classes because the javadoc tool can't figure this out (kind of a long-standing deficiency).

Personally, I'm not very sympathetic to the needs of continuing to maintain a javadoc site. I don't use them, I doubt others do, and moreover for open-source, really the source is what we want to see because it's better / more. That's me, anyway. I don't think maintaining them is more important than other goals like modularization, and thus I'm willing for there to be some degradation there.

Options:

In our javadocs, stop linking out to solrj or solrj-zookeeper because they cannot be differentiated by the standard javadoc tool.
Accept broken links; stop checking for them within javadoc folders.
Fix it (not something I'm willing to invest in)

Of course in main/10.0 we can remove the split packages but that's a back-combat break that isn't okay for 9.x.

CC @uschindler -- I'm curious what you think

Aug 14 '22 05:08 dsmiley

In our javadocs, stop linking out to solrj or solrj-zookeeper because they cannot be differentiated by the standard javadoc tool.

This seems acceptable for 9.x, and in main branch we could change package names and re-introduce some links if needed?

Aug 14 '22 15:08 janhoy

@sigram There is an abstraction SolrCloudManager and some related classes (DistributedQueue, DistribStateManager, NodeStateProvider) that you introduced once upon a time that don't seem to have anything to do with SolrJ yet you put them there. I believe these could go to Solr Core. Most need to at least go to Solrj-zookeeper because of DistributedQueue's reference of ZK classes.

Aug 15 '22 03:08 dsmiley

For constants related to the collection's state, I put it in an inner interface DocCollection.CollectionSProps. The "S" there is for State. If that's removed, then it's possibly confused with CollectionProperties. It's mostly an internal set of constants (it's not parameters) so I think it's okay it it doesn't roll off the tongue. Likewise I put constants for the Replica in Replica.ReplicaSProps. I'm not married to these choices.

There is a CollectionAdminParams class but it was missing quite a number of parameters that CollectionAdminRequest needed to specify so I moved them there.

There were a handful of other constants then I mostly made private and didn't fret over possible redundancy with ZkStateReader.

Overall I like having these props or params as constants in suitable locations, especially when we don't static import these constants. It adds clarity to the spots where they are referred to as being related to parameters or properties or whatever it is.

The only thing that remains is the javadoc split package concern. It should only be a handful of lines of code or so in render-javadoc.adoc to special case them to not inter-link.

Aug 16 '22 04:08 dsmiley

At the risk of bikeshedding.... How about ReplicaStateProps.LEADER instead of ReplicaSProps.LEADER since we don't have an existing concept of SProps throughout our codebase?

Aug 16 '22 12:08 epugh

Other than a CHANGES.txt entry, I think this PR is ready.

Aug 17 '22 01:08 dsmiley

Looks like the content about CloudSolrClient in the ref guide (https://solr.apache.org/guide/solr/latest/deployment-guide/solrj.html#building-and-running-solrj-applications) refers to the use of the solr url for confiugration I wonder if we should tweak the text thgouh to stress that using ZK is a vulnerability? Similar to what you put in the upgrade notes?

Also, do we need to mention the need for solrj-zookeeper dependency for CloudSolrClient users?

Aug 17 '22 11:08 epugh

Really looking forward to this!

Aug 17 '22 11:08 epugh

Docs look good!

Aug 18 '22 12:08 epugh

🚢 ???

Aug 18 '22 12:08 epugh

Planned commit message after the first line:

• CloudSolrClient builders use reflection to instantiate ZkClientClusterStateProvider when needed
• CollectionAdminParams constants
• CollectionStateProps constants
• SliceStateProps constants
• ReplicaStateProps constants
• SolrJ streaming expressions: don’t depend on ZK anymore
• javadocs: temporarily unlink to solrj modules due to split package issue
• PerReplicaStates.fetch -> PerReplicaStatesFetcher
• ZkStateReader.getCollectionPath -> DocCollection.getCollectionPath

Some other code changes seemed too obscure to mention.

Aug 18 '22 14:08 dsmiley

I think it's done and I've addressed comments. I would love to see this merged soon so as to unblock other efforts that touch lots of files. If nobody has further review comments, I'll merge it Thursday.

Aug 22 '22 17:08 dsmiley

solr solr copied to clipboard

SOLR-15342: Separate out a SolrJ-Zookeeper module

solr
solr copied to clipboard