rdf4j
rdf4j copied to clipboard
Upgrade Solr/Lucene and ES dependencies
Problem description
Giving it another try.... ElasticSearch and Apache Lucene/Solr could be upgraded to newer versions. There is an change / issue with the highlighter in Solr 8.5, so some code changes could be required.
Preferred solution
Upgraded to latest Lucene and Solr 8.x versions + ES 7.10.x (latest with OSI approved license)
Are you interested in contributing a solution yourself?
Yes
Alternatives you've considered
No response
Anything else?
See also discussion in #2392
As a reminder, something seems to have changed in the Solr Highlighter (8.5 and up): the org.eclipse.rdf4j.sail.solr.SolrIndex query() method does seem to get the correct number of search results in 8.5 and up, but no highlight results (which are retrieved separately) whatsoever, and as a result the tests in the SolrSailIndexedPropertiesTest fail
The Solr DefaultHighlighter is processing fields differently in 8.5, an option would be to use the UnifiedHighlighter, but this fails the testUnionQuery...
Although RDF4J is probably not at risk - we don't distribute this indirect log4j dependency, one has to use a project with solr - upgrading to Lucene / Solr 8.11.2 would get rid of warnings about log4j 1.2 and 2.14 CVEs (since the newer libraries will use the log4j v2.17.1 instead) in automated scanning tools
Using 8.11.2, Solr compliance test fails on AbstractLuceneSailTest.testSnippetLimitedToPredicate()
Also, solr/cores/embedded/conf/solrconfig.xml could be changed to run the Solr tests in memory
<directoryFactory class="org.apache.solr.core.RAMDirectoryFactory"/>
<indexConfig>
<lockType>single</lockType>
</indexConfig>
(otherwise using 8.11 seems to break tests using disk storage: write.lock file issue, maybe due to multiple threads or race condition...)
Be slightly less ambitious, since ES 7.16 removes (geo) shapebuilder to another (legacy) package, which isn't readily available from maven.
So using Solr/Lucene 8.9 and ES 7.15.2 as a compromise
Assuming we move to Solr/Lucene 9.x and ES 8.x in RDF4J 5.0.0 (which may have breaking changes)
How is the licensing compatibility for ES now a days?
I've filed an automated request for 1 ES dependency ( https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/6158.) so let's see if/how it works out ...
Could be a non-issue, since it's a more relaxed "works-with" dependency (but looks like dash-tool does not support / is not mean to handle "works-with", see https://github.com/eclipse/dash-licenses/issues/13) and we don't distribute elasticsearch jars
(we do distribute lucene / solr jars, but they are approved so no problem there)
We had a constructive meeting with Eclipse, they are looking into the final nitty-gritty details but basically they are OK with our setup ("works-with", not required, no distribution of ES-jars).
It would, however, be a good idea to clearly mention on our website / in the javadocs that users should check the ES-license if they want to use ES, especially in a service-offering model.
Created a series of IP requests (using dash-license) for the various ES client jar files, most of them are now under investigation
Some discussion on ElasticSearch runner, a component used for testing (very convenient for starting an ES server, running tests and stopping/removing the test-ES)
See https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/6409
This plugin might be an alternative: https://github.com/alexcojocaru/elasticsearch-maven-plugin
I think we should go with Testcontainers.
https://www.testcontainers.org/modules/elasticsearch/
Sounds good, I'm using testcontainers for other projects, and it works as advertised...
Actually the maven elasticearch plugin works just fine with openjdk-19 on my debian machine, but for some reason ES fails to start on the github machine (action with ubuntu-latest, not sure which JDK-brand) :-/
In the series "it works fine on my machine", apparently this issue makes it somewhat tricky to repeat that on the github CI machine: https://github.com/elastic/elasticsearch/issues/49124
Finally got it to work, so now waiting for the last CQ requests
Still no updates on the last CQ requests, I've added a few friendly reminders to the comments to the Eclipse IP gitlab issues...
Slowly inching towards approval, 2 more dependencies have been approved, 3 to go
Al approved, but we'll need to double-check our documentation to clearly mention that (re)users really must check the ES conditions / FAQ in the (unlikely ?) case they provide an as-a-service solution using ES Sail/Search (also see comments at the bottom of https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/7318)