tabulapdf
tabulapdf copied to clipboard
Subscript out of bounds error for much the same PDF
Thanks for this awesome package. It works well on all the .pdf-documents I have tried it on. I do however have a problem about the extract_tables like below. Also, You can reproduce this in your R studio, too.
This works with this pdf in 2015 :
library(tabulizer)
path2pdf <- "/Users/HidetakaKo/Desktop/2015-cookpad.pdf"
out <- extract_tables(path2pdf)
as.data.frame(out[[1]])
This doesn't work with this pdf in 2016 :
library(tabulizer)
path2pdf <- "/Users/HidetakaKo/Desktop/2016-cookpad.pdf"
out <- extract_tables(path2pdf)
as.data.frame(out[[1]])
These .pdf-documents format is much the same with the previous one.
I'm working on a MacAir with OS X 10.11.6 R 3.3.1 Exploratory Desktop RStudio Version 0.99.887
What does out
look like? (e.g., can you show str(out)
? It seems like the extraction is working, but the format isn't exactly what you're looking for. You might try modifying the method
argument of extract_tables()
so that you get something other than a matrix back.