extract_table() returns error corresponding to working directory
Question
Why does extract_table() return an error message corresponding to the working directory when the directory and the file exist?
Reproducible example
Below is the code where x is the full path to the pdf file. extract_table() does in fact return the tables, albeit with an error. The error seems to have something to do with the third object of the returned list.
I don't understand what it has to with the working directory. Please, may someone explain why this is happening and offer a suggestion to resolve the error?
Code
x <- list.files(full.names = TRUE)[1]
file.exists(x)
[1] TRUE
extract_tables(x, method = "decide")
Error : 'TOTAL PAYABLE 75,564.00 10,578.96 86,142.95' does not exist in current working directory ('C:/Users/<username>/OneDrive/Documents/Destination').
[[1]]
# A tibble: 6 × 8
QTY ITEM `KG/ITEM` `R/ITEM` `R/KG` EXCL VAT INCL
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 COW : BLACK 438 8500 19.4 17000 2380 19380
2 4 COW : BLK 481 9300 19.3 37200 5208 42408
3 1 COW : BLK 455 9400 20.7 9400 1316 10716
4 1 HEIFER : BLK 385 8800 22.9 8800 1232 10032
5 1 COW : RED 380 8750 23.0 8750 1225 9975
6 9+0 NA 4020 NA NA 81150 11361 92511
[[2]]
# A tibble: 5 × 5
`AAM COMMISSION` `5.00%` `4,057.50` `568.05` `4,625.55`
<chr> <chr> <dbl> <dbl> <dbl>
1 RMLA Statutory Levy 51.0 7.14 58.2
2 RPO 0.15% 122. 17.0 139.
3 FARMERS ASSOC HDLA :: 0.5%m 406. 56.8 463.
4 TRANSPORT FRYSLAN 950 133 1083
5 TOTAL DEDUCTIONS NA 5586. 782. 6368.
[[3]]
[1] "TOTAL PAYABLE\t75,564.00\t10,578.96\t86,142.95"
[[4]]
# A tibble: 3 × 7
`CATEGORY SUMMARY` QTY `R/KG` `TOTAL KG` `KG/ITEM` TOTAL `R/ITEM`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 COW 8 19.9 3635 454 72350 9044.
2 HEIFER 1 22.9 385 385 8800 8800
3 TOTAL 9 20.2 4020 447 81150 9017.
First of all, thank you for this package! I appreciate the time it takes to write and release one!
I, too, am getting this error. Here is output from an example using "argentina.pdf" from the package:
f <- system.file("examples", "argentina.pdf", package = "tabulapdf")
Just to show that this is the right file and that tabulapdf is working
extract_tables(f) New names:
• `` ->...4[[1]]
A tibble: 31 × 4
Apellido y Nombre Bloque político Provincia ...4
1 ABDALA de MATARAZZO, Norma Amanda Frente Cívico por Santiago Santiago del Estero AFIRMATIVO
2 ALBRIEU, Oscar Edmundo Nicolas Frente para la Victoria - PJ Rio Negro AFIRMATIVO
3 ALONSO, María Luz Frente para la Victoria - PJ La Pampa AFIRMATIVO
4 ARENA, Celia Isabel Frente para la Victoria - PJ Santa Fe AFIRMATIVO
5 ARREGUI, Andrés Roberto Frente para la Victoria - PJ Buenos Aires AFIRMATIVO
6 AVOSCAN, Herman Horacio Frente para la Victoria - PJ Rio Negro AFIRMATIVO
7 BALCEDO, María Ester Frente para la Victoria - PJ Buenos Aires AFIRMATIVO
8 BARRANDEGUY, Raúl Enrique Frente para la Victoria - PJ Entre Ríos AFIRMATIVO
9 BASTERRA, Luis Eugenio Frente para la Victoria - PJ Formosa AFIRMATIVO
10 BEDANO, Nora Esther Frente para la Victoria - PJ Córdoba AFIRMATIVO
21 more rows
Use print(n = ...) to see more rows
Now I capture just one row, which is the circumstance where I got the error
a <- locate_areas(f,pages=1)[[1]] Loading required namespace: shiny Loading required namespace: miniUI Loading required package: shiny
Listening on http://127.0.0.1:5031
a top left bottom right 274.4960 389.2861 293.2971 557.2228
And here's the error. As you can see it still returns the correct info in b.
b <- extract_tables(f, pages=1, area=list(a), col_names=F, guess=FALSE)[[1]] Error : 'Santiago del Estero AFIRMATIVO' does not exist in current working directory ('C:/Users/ogawajr/OneDrive - National Institutes of Health/Documents/Analyses/OPEN Door/PDF Scraping').
b
[1] "Santiago del Estero\tAFIRMATIVO"
hi @frandjango @jrogawa can you please send me your session info?
Done! Should be in your inbox.
hi @frandjango @jrogawa
first of all, my apologies for the delay, I was on medical leave
I could not replicate the error. Here is what I see:
> library(tabulapdf)
> f <- system.file("examples", "argentina.pdf", package = "tabulapdf")
> extract_tables(f)
New names:
• `` -> `...4`
[[1]]
# A tibble: 31 × 4
`Apellido y Nombre` `Bloque político` Provincia ...4
<chr> <chr> <chr> <chr>
1 ABDALA de MATARAZZO, Norma Amanda Frente Cívico por Santiago Santiago… AFIR…
2 ALBRIEU, Oscar Edmundo Nicolas Frente para la Victoria - … Rio Negro AFIR…
3 ALONSO, María Luz Frente para la Victoria - … La Pampa AFIR…
4 ARENA, Celia Isabel Frente para la Victoria - … Santa Fe AFIR…
5 ARREGUI, Andrés Roberto Frente para la Victoria - … Buenos A… AFIR…
6 AVOSCAN, Herman Horacio Frente para la Victoria - … Rio Negro AFIR…
7 BALCEDO, María Ester Frente para la Victoria - … Buenos A… AFIR…
8 BARRANDEGUY, Raúl Enrique Frente para la Victoria - … Entre Rí… AFIR…
9 BASTERRA, Luis Eugenio Frente para la Victoria - … Formosa AFIR…
10 BEDANO, Nora Esther Frente para la Victoria - … Córdoba AFIR…
# ℹ 21 more rows
# ℹ Use `print(n = ...)` to see more rows
> a <- locate_areas(f,pages=1)[[1]]
Loading required package: shiny
> a
top left bottom right
252.86617 20.67808 794.08851 564.56217
Listening on http://127.0.0.1:7171
> b <- extract_tables(f, pages=1, area=list(a), col_names=F, guess=FALSE)[[1]]
> b
# A tibble: 32 × 4
X1 X2 X3 X4
<chr> <chr> <chr> <chr>
1 Apellido y Nombre Bloque político Provinc… NA
2 ABDALA de MATARAZZO, Norma Amanda Frente Cívico por Santiago Santiag… AFIR…
3 ALBRIEU, Oscar Edmundo Nicolas Frente para la Victoria - PJ Rio Neg… AFIR…
4 ALONSO, María Luz Frente para la Victoria - PJ La Pampa AFIR…
5 ARENA, Celia Isabel Frente para la Victoria - PJ Santa Fe AFIR…
6 ARREGUI, Andrés Roberto Frente para la Victoria - PJ Buenos … AFIR…
7 AVOSCAN, Herman Horacio Frente para la Victoria - PJ Rio Neg… AFIR…
8 BALCEDO, María Ester Frente para la Victoria - PJ Buenos … AFIR…
9 BARRANDEGUY, Raúl Enrique Frente para la Victoria - PJ Entre R… AFIR…
10 BASTERRA, Luis Eugenio Frente para la Victoria - PJ Formosa AFIR…
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows