grobid
grobid copied to clipboard
"[PDF2XML_CONVERSION_FAILURE] PDF to XML conversion failed" for Mac OSX Sierra
Hi,
So I got the PDF2XML failure on my MacOS SIerra. The output of the error is as follows:
`/projectPath/grobid/grobid/grobid-home/models/reference-segmenter/model.wapiti org.grobid.core.exceptions.GrobidException:
[PDF2XML_CONVERSION_FAILURE] PDF to XML conversion failed on pdf file ./src/test/resources/Wang_paperAVE2008.pdf at org.grobid.core.document.DocumentSource.processPdf2XmlThreadMode(DocumentSource.java:184) at org.grobid.core.document.DocumentSource.pdf2xml(DocumentSource.java:133) at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:62) at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:49) at org.grobid.core.document.DocumentSource.fromPdf(DocumentSource.java:41) at org.grobid.core.engines.CitationParser.processingReferenceSection(CitationParser.java:210) at org.grobid.core.engines.Engine.processReferences(Engine.java:243) at org.grobidExample.ExampleBibTex.runGrobid(ExampleBibTex.java:52) at org.grobidExample.TestMyGrobid.testCitationBibTeX(TestMyGrobid.java:42) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)`
I am not sure what the issue is. Can somebody help me please?
Hi @power10dan could you please provide the PDF and the processing you were trying to perform?
Hi,
So I was attempting to run the grobid example code, using the PDF files that came with the examples.
Hello !
I think this is an issue for this repo then: https://github.com/kermitt2/grobid-example
Did you set the grobid-home path in the property file grobid-example/grobid-example.properties as explained in the readme of the grobid-example project?
Yes I did. For some reason, the code still fails.
Hello @purker and thanks, This is fixed with the latest version of our pdf2xml fork (https://github.com/kermitt2/pdf2xml). However this latest version is not integrated in GROBID yet (except for the windows version of grobid), because some training data must be updated to take into account some other improvement in the reading order of the pdf blocks.
I stumbled upon this issue and I've quickly tested it on Linux and Mac.
The file without comments works fine, but the original one fails. I've tried directly with pdfalto and it crashes with "Segmentation fault".
More details https://github.com/kermitt2/pdfalto/issues/148