rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

Upgrade Solr/Lucene and ES dependencies

Open barthanssens opened this issue 2 years ago • 2 comments

Problem description

Giving it another try.... ElasticSearch and Apache Lucene/Solr could be upgraded to newer versions. There is an change / issue with the highlighter in Solr 8.5, so some code changes could be required.

Preferred solution

Upgraded to latest Lucene and Solr 8.x versions + ES 7.10.x (latest with OSI approved license)

Are you interested in contributing a solution yourself?

Yes

Alternatives you've considered

No response

Anything else?

See also discussion in #2392

barthanssens avatar Nov 03 '21 20:11 barthanssens

As a reminder, something seems to have changed in the Solr Highlighter (8.5 and up): the org.eclipse.rdf4j.sail.solr.SolrIndex query() method does seem to get the correct number of search results in 8.5 and up, but no highlight results (which are retrieved separately) whatsoever, and as a result the tests in the SolrSailIndexedPropertiesTest fail

barthanssens avatar Nov 04 '21 18:11 barthanssens

The Solr DefaultHighlighter is processing fields differently in 8.5, an option would be to use the UnifiedHighlighter, but this fails the testUnionQuery...

barthanssens avatar Nov 05 '21 14:11 barthanssens

Although RDF4J is probably not at risk - we don't distribute this indirect log4j dependency, one has to use a project with solr - upgrading to Lucene / Solr 8.11.2 would get rid of warnings about log4j 1.2 and 2.14 CVEs (since the newer libraries will use the log4j v2.17.1 instead) in automated scanning tools

Using 8.11.2, Solr compliance test fails on AbstractLuceneSailTest.testSnippetLimitedToPredicate()

Also, solr/cores/embedded/conf/solrconfig.xml could be changed to run the Solr tests in memory

	<directoryFactory class="org.apache.solr.core.RAMDirectoryFactory"/>
	<indexConfig>
		<lockType>single</lockType>
	</indexConfig>

(otherwise using 8.11 seems to break tests using disk storage: write.lock file issue, maybe due to multiple threads or race condition...)

barthanssens avatar Jan 06 '23 00:01 barthanssens

Be slightly less ambitious, since ES 7.16 removes (geo) shapebuilder to another (legacy) package, which isn't readily available from maven.

So using Solr/Lucene 8.9 and ES 7.15.2 as a compromise

Assuming we move to Solr/Lucene 9.x and ES 8.x in RDF4J 5.0.0 (which may have breaking changes)

barthanssens avatar Jan 08 '23 14:01 barthanssens

How is the licensing compatibility for ES now a days?

hmottestad avatar Jan 08 '23 15:01 hmottestad

I've filed an automated request for 1 ES dependency ( https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/6158.) so let's see if/how it works out ...

barthanssens avatar Jan 08 '23 21:01 barthanssens

Could be a non-issue, since it's a more relaxed "works-with" dependency (but looks like dash-tool does not support / is not mean to handle "works-with", see https://github.com/eclipse/dash-licenses/issues/13) and we don't distribute elasticsearch jars

(we do distribute lucene / solr jars, but they are approved so no problem there)

barthanssens avatar Jan 09 '23 13:01 barthanssens

We had a constructive meeting with Eclipse, they are looking into the final nitty-gritty details but basically they are OK with our setup ("works-with", not required, no distribution of ES-jars).

It would, however, be a good idea to clearly mention on our website / in the javadocs that users should check the ES-license if they want to use ES, especially in a service-offering model.

barthanssens avatar Jan 19 '23 10:01 barthanssens

Created a series of IP requests (using dash-license) for the various ES client jar files, most of them are now under investigation

barthanssens avatar Jan 24 '23 10:01 barthanssens

Some discussion on ElasticSearch runner, a component used for testing (very convenient for starting an ES server, running tests and stopping/removing the test-ES)

See https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/6409

barthanssens avatar Feb 02 '23 12:02 barthanssens

This plugin might be an alternative: https://github.com/alexcojocaru/elasticsearch-maven-plugin

barthanssens avatar Feb 21 '23 15:02 barthanssens

I think we should go with Testcontainers.

https://www.testcontainers.org/modules/elasticsearch/

hmottestad avatar Feb 21 '23 20:02 hmottestad

Sounds good, I'm using testcontainers for other projects, and it works as advertised...

barthanssens avatar Feb 21 '23 22:02 barthanssens

Actually the maven elasticearch plugin works just fine with openjdk-19 on my debian machine, but for some reason ES fails to start on the github machine (action with ubuntu-latest, not sure which JDK-brand) :-/

barthanssens avatar Feb 27 '23 16:02 barthanssens

In the series "it works fine on my machine", apparently this issue makes it somewhat tricky to repeat that on the github CI machine: https://github.com/elastic/elasticsearch/issues/49124

barthanssens avatar Mar 01 '23 00:03 barthanssens

Finally got it to work, so now waiting for the last CQ requests

barthanssens avatar Mar 03 '23 21:03 barthanssens

Still no updates on the last CQ requests, I've added a few friendly reminders to the comments to the Eclipse IP gitlab issues...

barthanssens avatar Mar 28 '23 09:03 barthanssens

Slowly inching towards approval, 2 more dependencies have been approved, 3 to go

barthanssens avatar Apr 18 '23 08:04 barthanssens

Al approved, but we'll need to double-check our documentation to clearly mention that (re)users really must check the ES conditions / FAQ in the (unlikely ?) case they provide an as-a-service solution using ES Sail/Search (also see comments at the bottom of https://gitlab.eclipse.org/eclipsefdn/emo-team/iplab/-/issues/7318)

barthanssens avatar Apr 19 '23 14:04 barthanssens