tabulapdf icon indicating copy to clipboard operation
tabulapdf copied to clipboard

Multiple table in 1 page

Open leeper opened this issue 7 years ago • 4 comments

Migrated from https://github.com/ropenscilabs/tabulizerjars/issues/1 (@khun84)

Is there param that I can parse in to extract more than 1 table per page?

I have a pdf page with 2 tables:

  • table 1 is 2 columns and multiple rows
  • table 2 has 2 columns and multiple rows, but some of the cells are merged).

I use the extract_table() function with default param and the output only has 1 table (table 1).

What I can think of is to set method = 'asis' but I do not know to proceed with the output java object. Is there any documentation I can refer to?

leeper avatar Sep 22 '16 12:09 leeper

@khun84 Yes, you can specify the page number twice, along with the area (or use the extract_areas() function to specify those areas interactively).

So something like extract_areas(file, pages = c(1,1)). This will give you the chance to extract two different areas from a given page.

You can pursue the Java approach, but it's really only useful if you know the underlying tabula Java library well; and that is not very well documented anywhere.

leeper avatar Sep 22 '16 12:09 leeper

thanks for the clarification...ive tried with extract_areas(file, c(1, 1)) but it return the same table twice. If I have to explicitly define the area for both tables, then my code will break when the position of the tables change.

Is there any function that can return the entire content of the pdf in a DOM like format? In that case, I can traverse the DOM tree and extract what I want.

khun84 avatar Sep 22 '16 17:09 khun84

Hi @leeper - I've recently run into similar issues, but with multi-page documents and a random number of tables per page, I found that the 'spreadsheet' method on the command line and/or via Tabula's interface will drag them out. The write_csv function spills them all out correctly (at least in the cases I've tested), but the list_matrices function doesn't.

I've edited the list_matrices function if you're happy for a pull request?

SteveLane avatar Dec 21 '16 04:12 SteveLane

Yes, please send a PR!

leeper avatar Dec 21 '16 07:12 leeper