tabulapdf icon indicating copy to clipboard operation
tabulapdf copied to clipboard

Warning For Reflective Access

Open billdenney opened this issue 5 years ago • 16 comments

When working with the current version of R and rJava, there is a warning with extract_table() indicating:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by RJavaTools to method java.util.ArrayList$Itr.hasNext()
WARNING: Please consider reporting this to the maintainers of RJavaTools
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

Unfortunately, I cannot share the underlying .pdf file that caused the error.

billdenney avatar May 07 '19 00:05 billdenney

I reproduced using the first code example from the readme.

library("tabulizer")
f <- system.file("examples", "data.pdf", package = "tabulizer")
out1 <- extract_tables(f)

(Mac 10.13, tabulizer 0.2.2, rJava 0.9-11, R 3.6.0, Java 11.0.1)

fpinter avatar Jun 19 '19 20:06 fpinter

Getting the same in Linux too

bedantaguru avatar Jul 26 '19 10:07 bedantaguru

Just got the same warning. Using R version 3.6.0 (2019-04-26) on Mac OS 10.14.6

Has this caused any actual problems for others?

ziembaej avatar Sep 20 '19 15:09 ziembaej

In Travis it causes build failure.

bedantaguru avatar Sep 20 '19 15:09 bedantaguru

Anyone was able to solve it, I got the same error

antonio1970 avatar Oct 16 '19 09:10 antonio1970

Same here

MattCowgill avatar Mar 24 '20 05:03 MattCowgill

Same issue here

sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.4 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8
[4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8 LC_ADDRESS=de_DE.UTF-8
[10] LC_TELEPHONE=de_DE.UTF-8 LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] janitor_1.2.0 tabulizer_0.2.2 data.table_1.12.6 tidytext_0.2.0 dplyr_0.8.3
[6] stringr_1.4.0 rvest_0.3.4 xml2_1.2.2 selectr_0.4-1 cronR_0.4.0

dernapo avatar Mar 25 '20 15:03 dernapo

I have the same problem. I'm wondering if this problem is about the "quality document". In other words, there are documents (pdf's) can use it with Tabulizer. But, others not.

For example, if you download this pdf you can use Tabulizer. However, if you use this one cannot. I don't know why!. I don't believe illegal problems with the document. I think the "quality of information".

If you make a paper in Word or Excel, then export to pdf and try it, you can do it! So, it seems Tabulizer algorithm doesn't work in all pdf documents 🧙‍♂️

P.S. I ran in RStudio 1.2.5033 an R 3.6.3 (2020-02-29)

lefcgis avatar Mar 29 '20 03:03 lefcgis

@lefcgis, there definitely could be some documents that trigger the issue and some that do not, but it is a Java coding issue and not an issue with a PDF file (as in, the pdf standard is being followed). For more information, see https://stackoverflow.com/questions/50251798/what-is-an-illegal-reflective-access

billdenney avatar Mar 29 '20 09:03 billdenney

Vale! So, it's possible that the reason would be Jdk and Jdr packages, because there are prewiew prerequisites to install rJava. Thanks for your answer, @billdenney 🧙‍♂️

lefcgis avatar Mar 29 '20 10:03 lefcgis

Now it's causing to break my build

bedantaguru avatar May 12 '20 11:05 bedantaguru

For me, this warning only occurs the first time the example code is run in a new R session. Subsequent runs do not show this warning. Is that the same behavior others here are seeing?

The test code I've been using is...

out <- tabulizer::extract_tables(system.file("examples", "data.pdf", package = "tabulizer"))

If so, I'm curious if #125 resolves this issue for you.

cjyetman avatar Nov 16 '20 11:11 cjyetman

Same thing happened to me. Got this error the first time then just an empty list each subsequent run. I can read other pdfs but it fails on one which is a different format.

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tabulizer_0.2.2

loaded via a namespace (and not attached):
[1] tabulizerjars_1.0.1 compiler_4.0.2      tools_4.0.2         rJava_0.9-13       
[5] png_0.1-7 

maahutch avatar Jun 15 '21 16:06 maahutch

@maahutch An error, or a warning? Those are significantly different.

cjyetman avatar Jun 15 '21 22:06 cjyetman