camelot icon indicating copy to clipboard operation
camelot copied to clipboard

Same table extracted twice from PDF in stream mode

Open stertingen opened this issue 1 year ago • 0 comments

Describe the bug

Camelot extracts the same table twice under some circumstances. This happened in stream mode; camelot extracts the table only partially on the first try.

Steps to reproduce the bug

  1. Install camelot-py[base] with pip
  2. Download PDF file below
  3. Run script below

Expected behavior

I expected camelot to either extract exactly one table or multiple tables which do not overlap.

Code

#!/usr/bin/env python3

import camelot

tables = camelot.read_pdf("./Lijnfolder-dr-2024-regio-Arnhem.pdf", "8", flavor="stream")

for table in tables:
    camelot.plot.contour(table)

PDF

https://www.connexxion.nl/getmedia/c2bce2c6-ebfe-43a9-8154-0b6bec9244fd/Lijnfolder-dr-2024-regio-Arnhem.pdf

Screenshots

Image Image

Environment

  • OS: Windows 11
  • Python version: 3.12.8
  • Numpy version: 2.0.2
  • OpenCV version: 4.11.0
  • Ghostscript version:
  • camelot version: 1.0.0

Additional context

stertingen avatar Feb 03 '25 16:02 stertingen