camelot
camelot copied to clipboard
ZeroDivisionError when reading PDF in text_in_bbox
Reading PDF results in ZeroDivisionError
:
File "camelot/utils.py", line 376, in text_in_bbox
if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
ZeroDivisionError: float division by zero
ba
is <LTTextLineHorizontal 926.967,593.860,926.967,601.300 '€\n'>
which seems to have a zero width thus zero area.
Longer stack trace at the end.
Steps to reproduce the bug
pip install camelot-py[cv]==0.10.1
Download PDF below, read with camelot.
Expected behavior
No errors.
Code
import camelot
camelot.read_pdf(filename)
https://media.frag-den-staat.de/files/docs/31/73/d9/3173d9a9ed904445a8eb0b1b6271e869/munster_anlage2.pdf
Environment
- OS: macOS 11.5, also Ubuntu 20.04
- Python version: 3.8.5
- Numpy version: 1.20.3
- OpenCV version: 4.5.1
- Ghostscript version: Python package: 0.7,
gs --version
9.54.0 - Camelot version: 0.10.1
Additional context
Stack trace
... tables = camelot.read_pdf(filename) File "camelot/io.py", line 113, in read_pdf tables = p.parse( File "camelot/handlers.py", line 176, in parse t = parser.extract_tables( File "camelot/parsers/lattice.py", line 430, in extract_tables cols, rows, v_s, h_s = self._generate_columns_and_rows(table_idx, tk) File "camelot/parsers/lattice.py", line 322, in _generate_columns_and_rows t_bbox["horizontal"] = text_in_bbox(tk, self.horizontal_text) File "camelot/utils.py", line 376, in text_in_bbox if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8: ZeroDivisionError: float division by zero
Hi also had this issue
camelot-py==0.10.1
<LTTextLineHorizontal 202.905,138.447,202.905,149.442 '(cid:1)\n'>
File "/srv/fundcogito/fc-vault_venv/lib/python3.9/site-packages/camelot/parsers/stream.py", line 463, in extract_tables
cols, rows = self._generate_columns_and_rows(table_idx, tk)
File "/srv/fundcogito/fc-vault/document_parsers/utils/patch_camelot.py", line 92, in _generate_columns_and_rows
t_bbox["horizontal"] = text_in_bbox(tk, self.horizontal_text)
File "/srv/fundcogito/fc-vault_venv/lib/python3.9/site-packages/camelot/utils.py", line 378, in text_in_bbox
if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
ZeroDivisionError: float division by zero
I have the same issue.