rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

GH-5090 Lucene 9 version of the Lucene SAIL

Open reckart opened this issue 1 year ago • 6 comments

GitHub issue resolved: #5090

Briefly describe the changes proposed in this PR:

  • Copied the Lucene SAIL implementation and upgraded it to Lucene 9

PR Author Checklist (see the contributor guidelines for more details):

  • [x] my pull request is self-contained
  • [x] I've added tests for the changes I made
  • [x] I've applied code formatting (you can use mvn process-resources to format from the command line)
  • [x] I've squashed my commits where necessary
  • [x] every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

reckart avatar Jul 26 '24 15:07 reckart

@hmottestad build fails due to the Lucene 9 dependencies apparently not being on the whitelist.

reckart avatar Jul 26 '24 15:07 reckart

There is a license issue that the Eclipse IP-team needs to check for us. I'll try to look into that when I have some time.

The second seems to be a Javadocs issue. That might just be that we need to whitelist something, but I'm not sure. Could also be an issue with some transitive dependencies in Lucene 9.

hmottestad avatar Jul 27 '24 05:07 hmottestad

I'm still not very comfortable with having two Lucene sails. Mixing two versions of the same library has previously led to issues down the road. If someone was depending on the parent pom I would assume they would get both versions of the Lucene library.

hmottestad avatar Jul 27 '24 06:07 hmottestad

Some relevant discussion: https://github.com/eclipse-rdf4j/rdf4j/discussions/5037

hmottestad avatar Jul 27 '24 06:07 hmottestad

If someone was depending on the parent pom I would assume they would get both versions of the Lucene library.

The maven resolver can only resolve a particular JAR to a single version, not to multiple.

If you inherit from the parent POM, you get the Lucene version defined there. Nobody should really depend on the parent POM. They should use the rdf4j-bom instead. But even if they depended on the parent POM, I don't see how they'd get Lucene version as a dependency and not even as a managed version.

The situation is somewhat similar to the rdf4j-rio-jsonld and rdf4j-rio-jsonld-legacy - the user has to make sure to only have one of those in their dependencies.

The new module locally overrides the Lucene version property - it won't leak. Presently, I didn't even add a managed dependency for the module to the BOM - something that actually should still be done.

reckart avatar Jul 27 '24 14:07 reckart

The json-ld side is a bit different. There we are depending on two completely different libraries. The only issue that would occur if you depend on both is that you wouldn't be sure which one you end up getting through the Java Service Provider Interface.

Here with Lucene we are depending on two versions of the same library.

hmottestad avatar Jul 27 '24 16:07 hmottestad