tabulapdf
tabulapdf copied to clipboard
inconsistent behavior of extract_tables and extract_areas
Please specify whether your issue is about:
- [x] a possible bug
First: Thank you very much for this awesome package. It has saved me tremendous headaches in the past!
Now I have a weird behavior, that I cannot really wrap my head around.
When I do extract_areas() and locate the table, the result looks fine - I get back a complete table in the usual format.
When I do extract_tables() with the exact same area specified, the result is only list()
. I do not understand, why one returns the table and the other does not. I would appreciate your input!
Thanks in advance.
Put your code here:
## rJava loads successfully
# install.packages("rJava")
library("rJava")
library("tidyverse")
## load package
library("tabulizer")
httr::GET(
"https://www.bmwi.de/Redaktion/DE/Publikationen/Aussenwirtschaft/ruestungsexportbericht-2019.pdf?__blob=publicationFile",
httr::write_disk("temp.pdf")
)
tabulizer::extract_areas("temp.pdf",
pages = 82) %>%
as.data.frame()
tabulizer::extract_tables("temp.pdf",
pages = 82)
locate_areas("temp.pdf",
pages = 82)
tabulizer::extract_tables("temp.pdf",
pages = 82,
area = list(c(169.78232, 32.63903, 735.16167, 551.83787)))
## session info for your system
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 14393)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rJava_1.0-4 rvest_1.0.1 jsonlite_1.7.2 httr_1.4.2 shiny_1.7.0 pdftools_3.0.1 tabulizer_0.2.2
[8] SWPcdR_0.0.0.9000 extrafont_0.17 janitor_2.1.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[15] readr_2.0.1 tidyr_1.1.4 tibble_3.1.4 ggplot2_3.3.5 tidyverse_1.3.1 pacman_0.5.1
loaded via a namespace (and not attached):
[1] fs_1.5.0 sf_1.0-2 lubridate_1.7.10 tools_4.1.1 padr_0.6.0 backports_1.2.1
[7] bslib_0.3.0 utf8_1.2.2 R6_2.5.1 KernSmooth_2.23-20 DBI_1.1.1 colorspace_2.0-2
[13] withr_2.4.3 sp_1.4-5 tidyselect_1.1.1 curl_4.3.2 compiler_4.1.1 extrafontdb_1.0
[19] cli_3.0.1 xml2_1.3.2 sass_0.4.0 scales_1.1.1 classInt_0.4-3 proxy_0.4-26
[25] askpass_1.1 digest_0.6.27 pkgconfig_2.0.3 htmltools_0.5.2 dbplyr_2.1.1 fastmap_1.1.0
[31] rlang_0.4.11 readxl_1.3.1 rstudioapi_0.13 jquerylib_0.1.4 generics_0.1.1 magrittr_2.0.1
[37] Rcpp_1.0.7 munsell_0.5.0 fansi_0.5.0 lifecycle_1.0.1 stringi_1.7.4 snakecase_0.11.0
[43] grid_4.1.1 promises_1.2.0.1 crayon_1.4.2 miniUI_0.1.1.1 lattice_0.20-44 haven_2.4.3
[49] hms_1.1.1 pillar_1.6.4 reprex_2.0.1 glue_1.4.2 qpdf_1.1 modelr_0.1.8
[55] tabulizerjars_1.0.1 selectr_0.4-2 png_0.1-7 vctrs_0.3.8 tzdb_0.1.2 httpuv_1.6.3
[61] Rttf2pt1_1.3.9 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1 cachem_1.0.6 mime_0.12
[67] xtable_1.8-4 broom_0.7.10 countrycode_1.3.0 e1071_1.7-8 rnaturalearth_0.1.0 later_1.3.0
[73] class_7.3-19 giscoR_0.2.4 units_0.7-2 writexl_1.4.0 ellipsis_0.3.2