grobid icon indicating copy to clipboard operation
grobid copied to clipboard

"[PDF2XML_CONVERSION_FAILURE] PDF to XML conversion failed" for Mac OSX Sierra

Open power10dan opened this issue 8 years ago • 8 comments

Hi,

So I got the PDF2XML failure on my MacOS SIerra. The output of the error is as follows:

`/projectPath/grobid/grobid/grobid-home/models/reference-segmenter/model.wapiti org.grobid.core.exceptions.GrobidException:

[PDF2XML_CONVERSION_FAILURE] PDF to XML conversion failed on pdf file ./src/test/resources/Wang_paperAVE2008.pdf at org.grobid.core.document.DocumentSource.processPdf2XmlThreadMode(DocumentSource.java:184) at org.grobid.core.document.DocumentSource.pdf2xml(DocumentSource.java:133) at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:62) at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:49) at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:41) at org.grobid.core.engines.CitationParser.processingReferenceSection(CitationParser.java:210) at org.grobid.core.engines.Engine.processReferences(Engine.java:243) at org.grobidExample.ExampleBibTex.runGrobid(ExampleBibTex.java:52) at org.grobidExample.TestMyGrobid.testCitationBibTeX(TestMyGrobid.java:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)`

I am not sure what the issue is. Can somebody help me please?

power10dan avatar Sep 25 '17 03:09 power10dan

Hi @power10dan could you please provide the PDF and the processing you were trying to perform?

lfoppiano avatar Sep 25 '17 08:09 lfoppiano

Hi,

So I was attempting to run the grobid example code, using the PDF files that came with the examples.

power10dan avatar Sep 25 '17 16:09 power10dan

Hello !

I think this is an issue for this repo then: https://github.com/kermitt2/grobid-example

Did you set the grobid-home path in the property file grobid-example/grobid-example.properties as explained in the readme of the grobid-example project?

kermitt2 avatar Sep 29 '17 07:09 kermitt2

Yes I did. For some reason, the code still fails.

power10dan avatar Sep 29 '17 22:09 power10dan

TUW-217619.pdf

I also got this error.

purker avatar Nov 25 '17 15:11 purker

TUW-217619_without_comments.pdf

Happens also when the pdf has no comments.

purker avatar Nov 25 '17 15:11 purker

Hello @purker and thanks, This is fixed with the latest version of our pdf2xml fork (https://github.com/kermitt2/pdf2xml). However this latest version is not integrated in GROBID yet (except for the windows version of grobid), because some training data must be updated to take into account some other improvement in the reading order of the pdf blocks.

kermitt2 avatar Nov 25 '17 18:11 kermitt2

I stumbled upon this issue and I've quickly tested it on Linux and Mac.

The file without comments works fine, but the original one fails. I've tried directly with pdfalto and it crashes with "Segmentation fault".

More details https://github.com/kermitt2/pdfalto/issues/148

lfoppiano avatar Jul 24 '22 23:07 lfoppiano