tabulapdf icon indicating copy to clipboard operation
tabulapdf copied to clipboard

extract_table() returns error corresponding to working directory

Open frandjango opened this issue 9 months ago • 4 comments

Question

Why does extract_table() return an error message corresponding to the working directory when the directory and the file exist?

Reproducible example

Below is the code where x is the full path to the pdf file. extract_table() does in fact return the tables, albeit with an error. The error seems to have something to do with the third object of the returned list.

I don't understand what it has to with the working directory. Please, may someone explain why this is happening and offer a suggestion to resolve the error?

Code

x <- list.files(full.names = TRUE)[1]
file.exists(x)
[1] TRUE

extract_tables(x, method = "decide")

Error : 'TOTAL PAYABLE  75,564.00   10,578.96   86,142.95' does not exist in current working directory ('C:/Users/<username>/OneDrive/Documents/Destination').                                    
[[1]]                                                                                                                                                                                  
# A tibble: 6 × 8
  QTY   ITEM         `KG/ITEM` `R/ITEM` `R/KG`  EXCL   VAT  INCL
  <chr> <chr>            <dbl>    <dbl>  <dbl> <dbl> <dbl> <dbl>
1 2     COW : BLACK        438     8500   19.4 17000  2380 19380
2 4     COW : BLK          481     9300   19.3 37200  5208 42408
3 1     COW : BLK          455     9400   20.7  9400  1316 10716
4 1     HEIFER : BLK       385     8800   22.9  8800  1232 10032
5 1     COW : RED          380     8750   23.0  8750  1225  9975
6 9+0   NA                4020       NA   NA   81150 11361 92511

[[2]]
# A tibble: 5 × 5
  `AAM COMMISSION` `5.00%`        `4,057.50` `568.05` `4,625.55`
  <chr>            <chr>               <dbl>    <dbl>      <dbl>
1 RMLA             Statutory Levy       51.0     7.14       58.2
2 RPO              0.15%               122.     17.0       139. 
3 FARMERS ASSOC    HDLA :: 0.5%m       406.     56.8       463. 
4 TRANSPORT        FRYSLAN             950     133        1083  
5 TOTAL DEDUCTIONS NA                 5586.    782.       6368. 

[[3]]
[1] "TOTAL PAYABLE\t75,564.00\t10,578.96\t86,142.95"

[[4]]
# A tibble: 3 × 7
  `CATEGORY SUMMARY`   QTY `R/KG` `TOTAL KG` `KG/ITEM` TOTAL `R/ITEM`
  <chr>              <dbl>  <dbl>      <dbl>     <dbl> <dbl>    <dbl>
1 COW                    8   19.9       3635       454 72350    9044.
2 HEIFER                 1   22.9        385       385  8800    8800 
3 TOTAL                  9   20.2       4020       447 81150    9017.

frandjango avatar Mar 12 '25 10:03 frandjango

First of all, thank you for this package! I appreciate the time it takes to write and release one!

I, too, am getting this error. Here is output from an example using "argentina.pdf" from the package:

f <- system.file("examples", "argentina.pdf", package = "tabulapdf")

Just to show that this is the right file and that tabulapdf is working

extract_tables(f) New names:
• `` -> ...4 [[1]]

A tibble: 31 × 4

Apellido y Nombre Bloque político Provincia ...4

1 ABDALA de MATARAZZO, Norma Amanda Frente Cívico por Santiago Santiago del Estero AFIRMATIVO 2 ALBRIEU, Oscar Edmundo Nicolas Frente para la Victoria - PJ Rio Negro AFIRMATIVO 3 ALONSO, María Luz Frente para la Victoria - PJ La Pampa AFIRMATIVO 4 ARENA, Celia Isabel Frente para la Victoria - PJ Santa Fe AFIRMATIVO 5 ARREGUI, Andrés Roberto Frente para la Victoria - PJ Buenos Aires AFIRMATIVO 6 AVOSCAN, Herman Horacio Frente para la Victoria - PJ Rio Negro AFIRMATIVO 7 BALCEDO, María Ester Frente para la Victoria - PJ Buenos Aires AFIRMATIVO 8 BARRANDEGUY, Raúl Enrique Frente para la Victoria - PJ Entre Ríos AFIRMATIVO 9 BASTERRA, Luis Eugenio Frente para la Victoria - PJ Formosa AFIRMATIVO 10 BEDANO, Nora Esther Frente para la Victoria - PJ Córdoba AFIRMATIVO

21 more rows

Use print(n = ...) to see more rows

Now I capture just one row, which is the circumstance where I got the error

a <- locate_areas(f,pages=1)[[1]] Loading required namespace: shiny Loading required namespace: miniUI Loading required package: shiny

Listening on http://127.0.0.1:5031

a top left bottom right 274.4960 389.2861 293.2971 557.2228

And here's the error. As you can see it still returns the correct info in b.

b <- extract_tables(f, pages=1, area=list(a), col_names=F, guess=FALSE)[[1]] Error : 'Santiago del Estero AFIRMATIVO' does not exist in current working directory ('C:/Users/ogawajr/OneDrive - National Institutes of Health/Documents/Analyses/OPEN Door/PDF Scraping').

b

[1] "Santiago del Estero\tAFIRMATIVO"

jrogawa avatar Apr 01 '25 13:04 jrogawa

hi @frandjango @jrogawa can you please send me your session info?

pachadotdev avatar Apr 01 '25 17:04 pachadotdev

Done! Should be in your inbox.

jrogawa avatar Apr 01 '25 17:04 jrogawa

hi @frandjango @jrogawa

first of all, my apologies for the delay, I was on medical leave

I could not replicate the error. Here is what I see:

> library(tabulapdf)
> f <- system.file("examples", "argentina.pdf", package = "tabulapdf")
> extract_tables(f)
                                                                                                         New names:
• `` -> `...4`
[[1]]
# A tibble: 31 × 4
   `Apellido y Nombre`               `Bloque político`           Provincia ...4 
   <chr>                             <chr>                       <chr>     <chr>
 1 ABDALA de MATARAZZO, Norma Amanda Frente Cívico por Santiago  Santiago… AFIR…
 2 ALBRIEU, Oscar Edmundo Nicolas    Frente para la Victoria - … Rio Negro AFIR…
 3 ALONSO, María Luz                 Frente para la Victoria - … La Pampa  AFIR…
 4 ARENA, Celia Isabel               Frente para la Victoria - … Santa Fe  AFIR…
 5 ARREGUI, Andrés Roberto           Frente para la Victoria - … Buenos A… AFIR…
 6 AVOSCAN, Herman Horacio           Frente para la Victoria - … Rio Negro AFIR…
 7 BALCEDO, María Ester              Frente para la Victoria - … Buenos A… AFIR…
 8 BARRANDEGUY, Raúl Enrique         Frente para la Victoria - … Entre Rí… AFIR…
 9 BASTERRA, Luis Eugenio            Frente para la Victoria - … Formosa   AFIR…
10 BEDANO, Nora Esther               Frente para la Victoria - … Córdoba   AFIR…
# ℹ 21 more rows
# ℹ Use `print(n = ...)` to see more rows

> a <- locate_areas(f,pages=1)[[1]]
Loading required package: shiny
> a
      top      left    bottom     right 
252.86617  20.67808 794.08851 564.56217 

Listening on http://127.0.0.1:7171

> b <- extract_tables(f, pages=1, area=list(a), col_names=F, guess=FALSE)[[1]]
> b       
# A tibble: 32 × 4
   X1                                X2                           X3       X4   
   <chr>                             <chr>                        <chr>    <chr>
 1 Apellido y Nombre                 Bloque político              Provinc… NA   
 2 ABDALA de MATARAZZO, Norma Amanda Frente Cívico por Santiago   Santiag… AFIR…
 3 ALBRIEU, Oscar Edmundo Nicolas    Frente para la Victoria - PJ Rio Neg… AFIR…
 4 ALONSO, María Luz                 Frente para la Victoria - PJ La Pampa AFIR…
 5 ARENA, Celia Isabel               Frente para la Victoria - PJ Santa Fe AFIR…
 6 ARREGUI, Andrés Roberto           Frente para la Victoria - PJ Buenos … AFIR…
 7 AVOSCAN, Herman Horacio           Frente para la Victoria - PJ Rio Neg… AFIR…
 8 BALCEDO, María Ester              Frente para la Victoria - PJ Buenos … AFIR…
 9 BARRANDEGUY, Raúl Enrique         Frente para la Victoria - PJ Entre R… AFIR…
10 BASTERRA, Luis Eugenio            Frente para la Victoria - PJ Formosa  AFIR…
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows

pachadotdev avatar Apr 02 '25 15:04 pachadotdev