camelot icon indicating copy to clipboard operation
camelot copied to clipboard

ZeroDivisionError when reading PDF in text_in_bbox

Open stefanw opened this issue 3 years ago • 2 comments

Reading PDF results in ZeroDivisionError:

  File "camelot/utils.py", line 376, in text_in_bbox
    if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
ZeroDivisionError: float division by zero

ba is <LTTextLineHorizontal 926.967,593.860,926.967,601.300 '€\n'> which seems to have a zero width thus zero area.

Longer stack trace at the end.

Steps to reproduce the bug

pip install camelot-py[cv]==0.10.1

Download PDF below, read with camelot.

Expected behavior

No errors.

Code

import camelot
camelot.read_pdf(filename)

PDF

https://media.frag-den-staat.de/files/docs/31/73/d9/3173d9a9ed904445a8eb0b1b6271e869/munster_anlage2.pdf

Environment

  • OS: macOS 11.5, also Ubuntu 20.04
  • Python version: 3.8.5
  • Numpy version: 1.20.3
  • OpenCV version: 4.5.1
  • Ghostscript version: Python package: 0.7, gs --version 9.54.0
  • Camelot version: 0.10.1

Additional context

Stack trace
   ...
    tables = camelot.read_pdf(filename)
  File "camelot/io.py", line 113, in read_pdf
    tables = p.parse(
  File "camelot/handlers.py", line 176, in parse
    t = parser.extract_tables(
  File "camelot/parsers/lattice.py", line 430, in extract_tables
    cols, rows, v_s, h_s = self._generate_columns_and_rows(table_idx, tk)
  File "camelot/parsers/lattice.py", line 322, in _generate_columns_and_rows
    t_bbox["horizontal"] = text_in_bbox(tk, self.horizontal_text)
  File "camelot/utils.py", line 376, in text_in_bbox
    if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
ZeroDivisionError: float division by zero

stefanw avatar Jul 27 '21 14:07 stefanw

Hi also had this issue

camelot-py==0.10.1

<LTTextLineHorizontal 202.905,138.447,202.905,149.442 '(cid:1)\n'>

  File "/srv/fundcogito/fc-vault_venv/lib/python3.9/site-packages/camelot/parsers/stream.py", line 463, in extract_tables
    cols, rows = self._generate_columns_and_rows(table_idx, tk)
  File "/srv/fundcogito/fc-vault/document_parsers/utils/patch_camelot.py", line 92, in _generate_columns_and_rows
    t_bbox["horizontal"] = text_in_bbox(tk, self.horizontal_text)
  File "/srv/fundcogito/fc-vault_venv/lib/python3.9/site-packages/camelot/utils.py", line 378, in text_in_bbox
    if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
ZeroDivisionError: float division by zero

blackelk avatar Feb 11 '22 21:02 blackelk

I have the same issue.

rain01 avatar Feb 16 '22 22:02 rain01