tabulapdf icon indicating copy to clipboard operation
tabulapdf copied to clipboard

Subscript out of bounds error for much the same PDF

Open 21-Hidetaka-Ko opened this issue 7 years ago • 1 comments

Thanks for this awesome package. It works well on all the .pdf-documents I have tried it on. I do however have a problem about the extract_tables like below. Also, You can reproduce this in your R studio, too.

This works with this pdf in 2015 :

library(tabulizer)
path2pdf <- "/Users/HidetakaKo/Desktop/2015-cookpad.pdf"
out <- extract_tables(path2pdf)
as.data.frame(out[[1]])

2015

This doesn't work with this pdf in 2016 :

library(tabulizer)
path2pdf <- "/Users/HidetakaKo/Desktop/2016-cookpad.pdf"
out <- extract_tables(path2pdf)
as.data.frame(out[[1]])

2016

screen shot 2016-08-28 at 23 07 41

These .pdf-documents format is much the same with the previous one.

I'm working on a MacAir with OS X 10.11.6 R 3.3.1 Exploratory Desktop RStudio Version 0.99.887

21-Hidetaka-Ko avatar Aug 30 '16 01:08 21-Hidetaka-Ko

What does out look like? (e.g., can you show str(out)? It seems like the extraction is working, but the format isn't exactly what you're looking for. You might try modifying the method argument of extract_tables() so that you get something other than a matrix back.

leeper avatar Aug 30 '16 05:08 leeper