dataverse
dataverse copied to clipboard
sword2-server library overrides tika's apache-mime4j-core dependency with older version
What steps does it take to reproduce the issue?
-
Turn full text indexing on:
curl -X PUT -d true http://localhost:8080/api/admin/settings/:SolrFullTextIndexing
-
Create a dataset
-
Upload an e-mail file, for example
From: A To: B Subject: C An infrequent word: peripatetic
Attached here: email.txt
An error is displayed, even though the file is added. This is because full text indexing fails. The following error is found in the logs:
[2022-10-18T13:36:42.882+0200] [Payara 5.2022.3] [SEVERE] [] [edu.harvard.iq.dataverse.api.errorhandlers.ThrowableHandler] [tid: _ThreadID=108 _ThreadName=http-thread-poo
l::http-listener-1(16)] [timeMillis: 1666093002882] [levelValue: 1000] [[ javax.ejb.EJBException: org/apache/james/mime4j/stream/MimeConfig$Builder
at com.sun.ejb.containers.EJBContainerTransactionManager.processSystemException(EJBContainerTransactionManager.java:723)
at com.sun.ejb.containers.EJBContainerTransactionManager.completeNewTx(EJBContainerTransactionManager.java:652)
at com.sun.ejb.containers.EJBContainerTransactionManager.postInvokeTx(EJBContainerTransactionManager.java:482) at com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4601)
at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2134)
at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2104)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:220) at com.sun.ejb.containers.EJBLocalObjectInvocationHandlerDelegate.invoke(EJBLocalObjectInvocationHandlerDelegate.java:90)
at com.sun.proxy.$Proxy325.indexDataset(Unknown Source)
at edu.harvard.iq.dataverse.search.__EJB31_Generated__IndexServiceBean__Intf____Bean__.indexDataset(Unknown Source)
at edu.harvard.iq.dataverse.api.Index.indexDatasetByPersistentId(Index.java:319) at jdk.internal.reflect.GeneratedMethodAccessor1916.invoke(Unknown Source)
....
Caused by: java.lang.NoClassDefFoundError: org/apache/james/mime4j/stream/MimeConfig$Builder
at org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:74)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:289)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:185) at edu.harvard.iq.dataverse.search.IndexServiceBean.toSolrDocs(IndexServiceBean.java:1054)
at edu.harvard.iq.dataverse.search.IndexServiceBean.addOrUpdateDataset(IndexServiceBean.java:1307)
at edu.harvard.iq.dataverse.search.IndexServiceBean.addOrUpdateDataset(IndexServiceBean.java:731)
at edu.harvard.iq.dataverse.search.IndexServiceBean.indexDataset(IndexServiceBean.java:599)
at jdk.internal.reflect.GeneratedMethodAccessor1747.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.glassfish.ejb.security.application.EJBSecurityManager.runMethod(EJBSecurityManager.java:588)
at org.glassfish.ejb.security.application.EJBSecurityManager.invoke(EJBSecurityManager.java:408) at com.sun.ejb.containers.BaseContainer.invokeBeanMethod(BaseContainer.java:4835)
at com.sun.ejb.EjbInvocation.invokeBeanMethod(EjbInvocation.java:665)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:834)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615) at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.doCall(SystemInterceptorProxy.java:163)
at com.sun.ejb.containers.interceptors.SystemInterceptorProxy.aroundInvoke(SystemInterceptorProxy.java:140)
at jdk.internal.reflect.GeneratedMethodAccessor151.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888)
at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
at com.sun.ejb.EjbInvocation.proceed(EjbInvocation.java:615) at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72)
at org.jboss.weld.module.ejb.SessionBeanInterceptor.aroundInvoke(SessionBeanInterceptor.java:52)
at jdk.internal.reflect.GeneratedMethodAccessor146.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at com.sun.ejb.containers.interceptors.AroundInvokeInterceptor.intercept(InterceptorManager.java:888) at com.sun.ejb.containers.interceptors.AroundInvokeChainImpl.invokeNext(InterceptorManager.java:833)
at com.sun.ejb.containers.interceptors.InterceptorManager.intercept(InterceptorManager.java:375)
at com.sun.ejb.containers.BaseContainer.__intercept(BaseContainer.java:4807) at com.sun.ejb.containers.BaseContainer.intercept(BaseContainer.java:4795)
at com.sun.ejb.containers.EJBLocalObjectInvocationHandler.invoke(EJBLocalObjectInvocationHandler.java:212)
... 78 more
-
When does this issue occur? When a e-mail file is to be indexed.
-
Which page(s) does it occurs on? Not really a user interface issue, but an error is displayed when you upload the file.
-
What happens? See above
-
To whom does it occur (all users, curators, superusers)? all users
-
What did you expect to happen? The file should be indexed correctly
Which version of Dataverse are you using?
- v5.11.1
- develop
Any related open or closed issues to this bug report?
The problem is a recurring one: dependencies needed by Tika are overriden by older ones, so that the required classes or methods are not found at runtime. In this case sword2-server
is the culprit. The quick fix is to exclude apache-mime4j-core from the sword2-server
dependency (thanks to @qqmyers for the suggestion). A more solid fix would possibly be introducing Java modules into Dataverse, so that the transitive dependencies of primary dependencies don't interfere with one another.
@janvanmansum I'm sorry, but Java Modules might be unlikely to help here. As the SWORD library is requiring a RFC 5023 implementation and the only one around is the dead Apache Abdera project (and I guess no one wants to reimplement or fork that lib) and these dependencies are not Java 9+ enabled, I'm not sure this would solve our problem.
Usually, the way to deal with such conflicts is to use a proper entry in <dependencyManagement>
of Dataverse's POM. See also dev guide where I wrote a few words about this.
but
As we control the lib (please find it at https://github.com/gdcc/sword2-server), we might choose to go a different path here. As the Abdera lib is not updated any longer, it might be preferable to freeze its dependencies in time instead of crossing fingers on every update of transitive dependencies. How about shading the complete Abdera lib and it's dependencies into the SWORD lib JAR?
@poikilotherm do you think we should do what @PaulBoon did in the following pull request?
- https://github.com/DANS-KNAW/dataverse/pull/173
@pdurbin Is there any reason not to apply the suggested fix, because it would be great if we can get rid of this issue with the next release.
@PaulBoon you're talking about https://github.com/DANS-KNAW/dataverse/pull/173 right, not the fix @poikilotherm suggested above?
@pdurbin I am talking about the solution we have working, however if a better solution is available soon that would be nice. Meanwhile we can use the fix we have.
@PaulBoon gotcha, thanks.
From the commit above, it looks like @qqmyers picked it up for QDR. 😄