camelot icon indicating copy to clipboard operation
camelot copied to clipboard

Input of different table areas on different pdf pages.

Open RyosukeSakaguchi opened this issue 3 years ago • 1 comments

If a PDF has multiple pages, I want to specify a different table area for each page. In other words, I would like the table_areas argument of camelot.read_pdf() to be able to specify the table area per page as follows.

table_areas = {
1: [[10, 20, 30, 40], ..],
2: [[80, 100, 90, 120], ..], 
4: [[..], ..], ...
}

The dictionary key is the page number.

RyosukeSakaguchi avatar Jun 23 '22 07:06 RyosukeSakaguchi

This would be very useful! Currently, if you have different table areas on different pages, you need to call read_pdf() separately for each table_areas value and then manually combine the data.

I think the dictionary suggested by @RyosukeSakaguchi is a good approach, but it would probably still use the current string syntax:

table_areas = {
1: ["10, 20, 30, 40", ..],
2: ["80, 100, 90, 120", ..], 
4: ["..", ..], ...
}

goto-loop avatar Apr 19 '23 12:04 goto-loop