staplr
staplr copied to clipboard
Make get_fields faster by using rJava
There seems to be an overhead associated with calling Java from the system
function, at least for me on Ubuntu. For get_fields
, I was able to call some of the lower-level pdftk functions directly with rJava, and the result is more than twice as fast on my computer. (All the tests still pass.)
If this change is merged, we may also want to change the onload.R file to use .jpackage
instead of .jinit
. This is relevant to the new rJava code because we have to add the pdftk jar file to the classpath.
Codecov Report
Merging #52 (8afecb3) into master (da138f0) will increase coverage by
0.12%
. The diff coverage is100.00%
.
:exclamation: Current head 8afecb3 differs from pull request most recent head 88ae410. Consider uploading reports for the commit 88ae410 to get more accurate results
@@ Coverage Diff @@
## master #52 +/- ##
==========================================
+ Coverage 86.39% 86.51% +0.12%
==========================================
Files 11 11
Lines 441 445 +4
==========================================
+ Hits 381 385 +4
Misses 60 60
Impacted Files | Coverage Δ | |
---|---|---|
R/fill_pdf.R | 86.63% <100.00%> (+0.25%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update da138f0...88ae410. Read the comment docs.
ah been meaning to implement rjava to make it work with the whole thing. don't quite remember where i got stuck though it's been a while.. will test this out and merge with an intent to change the other functions as well
I was surprised that rJava seems to allow you to call methods from non-public classes, which I couldn't get to work from a regular Java file. So at least for this one it mostly came down to finding the java function, and then finding the right JNI signatures.
Anything I can do to get this merged faster? I think the only impact it has on the rest of the package is that it adds a real dependency on rJava, though it's already listed as a dependency anyway. (Since other functions call Java via system calls, I think they only require Java itself.)
Sorry about the delay, was hoping to make a larger scale review but couldn't find the time. As it stands this should be safe to merge
I'll go ahead and switch .jinit
to .jpackage
so the package is ready for more rJava functions. And here are some notes for anyone who wants to do more like this--
The code run by each pdftk command is in java/pdftk-master/java/com/gitlab/pdftk_java/TK_Session.java
under TK_Session.create_output()
. The definitions of the "report" functions for printing output files are at
java/pdftk-master/java/com/gitlab/pdftk_java/report.java
.
.jcall
requires JNI signatures, which can be found with javap
. For example, to get JNI signatures for staplr pdftk classes, run something like (from the source code directory)
javap -s -classpath inst/pdftk-java/pdftk.jar pdftk/com/lowagie/text/pdf/PdfReader
or
javap -s -classpath inst/pdftk-java/pdftk.jar com/gitlab/pdftk_java/report
where the classfile option is defined relative to java/pdftk-master/java.I thought those were the more confusing parts.