datashare icon indicating copy to clipboard operation
datashare copied to clipboard

upgrade tika to the 2.4.1 release

Open bamthomas opened this issue 3 years ago • 3 comments

Is your feature request related to a problem? Please describe.

No.

Describe the solution you'd like

Datashare/extract is having a dependency on Tika 1.22 (released 1st of august 2019). Since then there has been 4 releases, the latest is 1.26 and there is a 2.0.0-alpha.

For now it is breaking the indexing features with java.lang.NoSuchMethodError (see ICIJ/extract@55ff0cc7bc5acc570839849e149cc039fb5c45ad )

It is necessary to check all dependencies from tika that are specified in the pom.xml (and with datashare transitive dependencies).

The root cause for the NotSuchMethodError seemed to be commons-codec that needed to be upgraded from 1.10 to 1.13. But after having done it we still saw the error.

this may be related to https://issues.liferay.com/browse/LPS-120596

bamthomas avatar Apr 06 '21 09:04 bamthomas

Upgrading Tika early and often is a good idea. Let me know if you want to chat about migrating to >= 2.1.0.

tballison avatar Oct 13 '21 11:10 tballison

@tballison thanks for your message. I'm digging into it. what do you think is the best :

  • progressive upgrade 1.24/1.26/ 2.0 ...
  • going straight to 2.1 and solving problems 1 by 1 ?
  • other strategy ?

bamthomas avatar Sep 20 '22 15:09 bamthomas

If you have time, I'd recommend going straight to 2.4.1. There aren't that many diffs/changes within 2.x. This is the documentation we've put together: https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

The 2.5.0 release should happen in the next few weeks, but that should be a drop in replacement for 2.4.1.

Let me know if you have any questions on 2.x!

tballison avatar Sep 20 '22 15:09 tballison

This issue is stale because it has been open for 40 days with no activity.

github-actions[bot] avatar Nov 23 '22 00:11 github-actions[bot]