camelot Tables on multiple pages

Hey there! This is more a question than an issue, sorry! I am using Camelot to extract data from PDFs, some are big. I have a lot of cases where a table is on more than one page. In some cases like this: https://snipboard.io/dMEuF7.jpg the table of the first page will have 14 columns as expected, the one on the second page will have only 13, the first one on the left disappears because there is no line (it's actually 1 merged cell that goes from page 1 to page 6)

Is there a way to

force camelot to extract only one table, or
extract the columns places from the table extracted from the first page? This way I could use the "columns" params from the doc: https://camelot-py.readthedocs.io/en/master/user/advanced.html#specify-column-separators

Thanks a lot! Luc

Apr 18 '20 16:04 lucmartinon

Hi, Did you find the solution for the tables on multiple pages? I am also getting the same issue when reading such tables it treats as new table per page

Jun 12 '20 18:06 idea1002

Hey,

I didn't find any solution within Camelot, no. But I switched to first converting PDFs to XLSX using commercial products, then importing the data. It was simply much faster & easier. Converter tested: smallPDF (10€ /month): works well, also on locked PDFs, but generates an excel with one tab per pdf table, and potentially you have to harmonize manually the tabs. But likely if it is the really same table going on many pages it will be in one tab in excel. Adobe (17 € annually, for the online converter). Always convert to only one tab in Excel, and works globally better in my case at least. Doens't work with locked PDFs.

Both have a free trial that allows testing. There may be more than these, I stopped searching because I was happy with the result.

Jun 15 '20 09:06 lucmartinon

You can implement a method to accept templates as parameters for each page, something like tabula.io.read_pdf_with_template() method. You can find more about it here and here

P.S. - Tabula doesn't have it properly implemented, it can be a great addition to Camelot and which I am actively looking for!

Jul 19 '21 15:07 KunalGehlot

Aug 20 '21 04:08 qianxuanyon

camelot camelot copied to clipboard

Tables on multiple pages

camelot
camelot copied to clipboard