camelot
camelot copied to clipboard
Unknown GhostScript error after successful camelot installation
Hello,
I am currently on an Amazon EC2 Linux machine and have installed camelot through Anaconda with conda install -c conda-forge camelot-py. The installation happened without any issues. I could see Ghostscript as part of the dependencies being installed through Anaconda.
Afterwards, I attempted to extract the table from the example foo.pdf. From the documentation, this should be a simple tables = camelot.read_pdf('foo.pdf'). However, immediately after running that command, I received the following long error.
---------------------------------------------------------------------------
GhostscriptError Traceback (most recent call last)
<ipython-input-8-6d588ec94ca5> in <module>
----> 1 tables = camelot.read_pdf('./PDFs/foo.pdf')
/usr/local/.../lib/python3.8/site-packages/camelot/io.py in read_pdf(filepath, pages, password, flavor, suppress_stdout, layout_kwargs, **kwargs)
111 p = PDFHandler(filepath, pages=pages, password=password)
112 kwargs = remove_extra(kwargs, flavor=flavor)
--> 113 tables = p.parse(
114 flavor=flavor,
115 suppress_stdout=suppress_stdout,
/usr/local/.../lib/python3.8/site-packages/camelot/handlers.py in parse(self, flavor, suppress_stdout, layout_kwargs, **kwargs)
169 parser = Lattice(**kwargs) if flavor == "lattice" else Stream(**kwargs)
170 for p in pages:
--> 171 t = parser.extract_tables(
172 p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs
173 )
/usr/local/.../lib/python3.8/site-packages/camelot/parsers/lattice.py in extract_tables(self, filename, suppress_stdout, layout_kwargs)
400 return []
401
--> 402 self._generate_image()
403 self._generate_table_bbox()
404
/usr/local/.../lib/python3.8/site-packages/camelot/parsers/lattice.py in _generate_image(self)
217 gs_call = gs_call.encode().split()
218 null = open(os.devnull, "wb")
--> 219 with Ghostscript(*gs_call, stdout=null) as gs:
220 pass
221 null.close()
/usr/local/.../lib/python3.8/site-packages/camelot/ext/ghostscript/__init__.py in Ghostscript(*args, **kwargs)
88 if __instance__ is None:
89 __instance__ = gs.new_instance()
---> 90 return __Ghostscript(
91 __instance__,
92 args,
/usr/local/.../lib/python3.8/site-packages/camelot/ext/ghostscript/__init__.py in __init__(self, instance, args, stdin, stdout, stderr)
37 if stdin or stdout or stderr:
38 self.set_stdio(stdin, stdout, stderr)
---> 39 rc = gs.init_with_args(instance, args)
40 self._initialized = True
41 if rc == gs.e_Quit:
/usr/local/.../lib/python3.8/site-packages/camelot/ext/ghostscript/_gsprint.py in init_with_args(instance, argv)
172 rc = libgs.gsapi_init_with_args(instance, len(argv), c_argv)
173 if rc not in (0, e_Quit, e_Info):
--> 174 raise GhostscriptError(rc)
175 return rc
176
GhostscriptError: -770376232
That number at the end appears to be change every time I run the command. It stays in that general area of -700 million. The error was when I was in a Jupyter Notebook. Running this while on the pure command line simply prints out a Segmentation Fault. Downgrading the Python version from 3.8 to 3.6 did not fix the issue.
I tried to see if this was a GhostScript problem, but running
gs -sDEVICE=txtwrite -o extractedText.txt ./PDFs/foo.pdf
worked as intended and I could see the text document all nicely formatted. I am unsure as to what the problem could be at this point. Any help is appreciated.
Thanks!
I've seen this issue as well. I've been able to fix the issue by using ghostscript-9.26 installed using apt-get on ubuntu. The 9.53 version seems to be causing the issue with seg faults.
Having the same issue as stated above, gs-version is 9.26 installed using apt-get on ubuntu.
Hi All,
I hit something similar to this issue #193. The stack trace seems to be pointing to not finding libgs.
Have you confirmed that libgs is installed per the docs? See the Camelot docs on how to confirm a working Ghostscript install.
Anaconda did not address this until this issue was addressed in Nov 2020, so perhaps the version you are running has not addressed the partial ghostscript build bug in Anaconda?
If using apt, consider searching / installing package apt install libgs9
libgs9
Not sure, I'm getting the ghostscript error (also installed via conda-forge), even though libgs9 is already the newest version (9.50~dfsg-5ubuntu4.2).
conda install camelot-py "ghostscript<9.52"
# Installed ghostscript 9.22
This doesn't cause the errors any longer.
Is this a problem with version 9.5+ of ghostscript, or is it something Camelot tries to do with Ghostscript that has changed?
Just to confirm can you run the following in your environment:
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes.util import find_library
>>> find_library("gs")
'libgs.so.9'
>>>
Do you get similar output or not? If you get a return as above, then you don't have the problem I am describing - it is something new I am not aware of.
I just tried updating to ghostscript 9.54.0 h9c3ff4c_0 conda-forge/linux-64.
Back to getting errors. Jupyter kernel literally fails and restarts when I try to load a PDF with Camelot.
Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:02:46)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.23.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from ctypes.util import find_library
In [2]: find_library("gs")
Out[2]: '/opt/miniconda3/envs/myenv/lib/libgs.so.9'
It seems that ghostscript starting with version 9.2x or later starts causing these errors in Camelot. Might it be a breaking change that was introduced in ghostscript and Camelot still tries to use some no-longuer functioning api?
Yes - you may want to use the python gs module and simply open the PDF without using Camelot to isolate the issue a little more. Unfortunately I have limited expertise here.
Yes - you may want to use the python gs module and simply open the PDF without using Camelot to isolate the issue a little more. Unfortunately I have limited expertise here.
Thanks, well hoping someone from the Camelot team will take a look here to fix the issue.
UPDATE on issue (still persisting with ghostscript 9.54.0):
Camelot breaks (Jupyter kernel has to literraly restart) when trying to read PDF with lattice flavor. However, it seems to work with stream flavor.
Yep only the lattice flavor uses ghostscript. I'll have to figure out a way to reproduce this issue.
Meanwhile, can you try installing the latest version with pip install "camelot-py[base]==0.10.1" and then trying out the poppler image conversion backend? Here's a snippet:
import camelot
tables = camelot.read_pdf("https://camelot-py.readthedocs.io/en/master/_static/pdf/foo.pdf", backend="poppler")
tables[0]
# <Table shape=(7, 7)>
More info in the docs here: https://camelot-py.readthedocs.io/en/master/user/advanced.html#use-alternate-image-conversion-backends
was running into this same issue with ghostscript 9.50, lattice flavour would immediately crash python or jupyter with:
“../path/to/file/" terminated by signal SIGSEGV (Address boundary error)
changing to
camelot.read_pdf(.... backend="poppler")
stopped this and I managed to parse my PDFs with no issues thus far.