tabula-java icon indicating copy to clipboard operation
tabula-java copied to clipboard

Getting "subprocess" error for the same PDF files which are working fine with Tabula in local machine.

Open deepakdhiman7 opened this issue 1 year ago • 0 comments

We are getting below "subprocess" error, when we are running code in container. In local machine, however it is working fine. We had installed Tabula on local machine an year back. Even in container, it was working fine until this week. Attaching PDFs as well for which it is failing. Versions of packages mentioned below. Can it be PDF files although for same version they are running in local machine? or Environments? Although we checked, there has been no update in environments permissions etc.

PDFs: IONIS Registartion document (002).pdf test_Vinayak.pdf Uploading Annual_Report.pdf…

Package Versions: (llms) dd00740409@ns3067540:~$ java -version openjdk version "1.8.0_312" OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07) OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)

(llms) dd00740409@ns3067540:~$ python Python 3.8.17 | packaged by conda-forge | (default, Jun 16 2023, 07:06:00) [GCC 11.4.0] on linux

Error: subprocess.CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', '/usr/local/lib/python3.8/site-packages/tabula/tabula-1.0.5-jar-with-dependencies.jar', '--pages', '9', '--stream', '--guess', '--format', 'JSON', 'Roa8dvYUVmHQLKhhvTiPL.pdf']' returned non-zero exit status 1.

Logs: Exception in thread "main" java.lang.UnsatisfiedLinkError: /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libjavajpeg.so: libjpeg.so.8: cannot open shared object file: No such file or directory at java.lang.ClassLoader$NativeLibrary.load(Native Method) at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1838) at java.lang.Runtime.loadLibrary0(Runtime.java:843) at java.lang.System.loadLibrary(System.java:1136) at com.sun.imageio.plugins.jpeg.JPEGImageReader$1.run(JPEGImageReader.java:92) at com.sun.imageio.plugins.jpeg.JPEGImageReader$1.run(JPEGImageReader.java:90) at java.security.AccessController.doPrivileged(Native Method) at com.sun.imageio.plugins.jpeg.JPEGImageReader.<clinit>(JPEGImageReader.java:89) at com.sun.imageio.plugins.jpeg.JPEGImageReaderSpi.createReaderInstance(JPEGImageReaderSpi.java:85) at javax.imageio.spi.ImageReaderSpi.createReaderInstance(ImageReaderSpi.java:320) at javax.imageio.ImageIO$ImageReaderIterator.next(ImageIO.java:529) at javax.imageio.ImageIO$ImageReaderIterator.next(ImageIO.java:513) at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:155) at org.apache.pdfbox.filter.DCTFilter.decode(DCTFilter.java:58) at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:80) at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175) at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:243) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.createInputStream(PDImageXObject.java:791) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.from8bit(SampledImageReader.java:517) at org.apache.pdfbox.pdmodel.graphics.image.SampledImageReader.getRGBImage(SampledImageReader.java:226) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:481) at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.getImage(PDImageXObject.java:462) at org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1110) at org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:67) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:933) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:514) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:492) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155) at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:277) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:347) at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:268) at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:254) at technology.tabula.Utils.pageConvertToImage(Utils.java:285) at technology.tabula.detectors.NurminenDetectionAlgorithm.detect(NurminenDetectionAlgorithm.java:101) at technology.tabula.CommandLineApp$TableExtractor.extractTablesBasic(CommandLineApp.java:421) at technology.tabula.CommandLineApp$TableExtractor.extractTables(CommandLineApp.java:408) at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:180) at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.java:124) at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:106) at technology.tabula.CommandLineApp.main(CommandLineApp.java:76)

deepakdhiman7 avatar Feb 23 '24 07:02 deepakdhiman7