staplr icon indicating copy to clipboard operation
staplr copied to clipboard

Make get_fields faster by using rJava

Open wmay opened this issue 3 years ago • 6 comments

There seems to be an overhead associated with calling Java from the system function, at least for me on Ubuntu. For get_fields, I was able to call some of the lower-level pdftk functions directly with rJava, and the result is more than twice as fast on my computer. (All the tests still pass.)

If this change is merged, we may also want to change the onload.R file to use .jpackage instead of .jinit. This is relevant to the new rJava code because we have to add the pdftk jar file to the classpath.

wmay avatar Mar 05 '21 15:03 wmay

Codecov Report

Merging #52 (8afecb3) into master (da138f0) will increase coverage by 0.12%. The diff coverage is 100.00%.

:exclamation: Current head 8afecb3 differs from pull request most recent head 88ae410. Consider uploading reports for the commit 88ae410 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master      #52      +/-   ##
==========================================
+ Coverage   86.39%   86.51%   +0.12%     
==========================================
  Files          11       11              
  Lines         441      445       +4     
==========================================
+ Hits          381      385       +4     
  Misses         60       60              
Impacted Files Coverage Δ
R/fill_pdf.R 86.63% <100.00%> (+0.25%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update da138f0...88ae410. Read the comment docs.

codecov[bot] avatar Mar 05 '21 16:03 codecov[bot]

ah been meaning to implement rjava to make it work with the whole thing. don't quite remember where i got stuck though it's been a while.. will test this out and merge with an intent to change the other functions as well

oganm avatar Mar 05 '21 22:03 oganm

I was surprised that rJava seems to allow you to call methods from non-public classes, which I couldn't get to work from a regular Java file. So at least for this one it mostly came down to finding the java function, and then finding the right JNI signatures.

wmay avatar Mar 06 '21 00:03 wmay

Anything I can do to get this merged faster? I think the only impact it has on the rest of the package is that it adds a real dependency on rJava, though it's already listed as a dependency anyway. (Since other functions call Java via system calls, I think they only require Java itself.)

wmay avatar Oct 28 '21 00:10 wmay

Sorry about the delay, was hoping to make a larger scale review but couldn't find the time. As it stands this should be safe to merge

oganm avatar Oct 28 '21 21:10 oganm

I'll go ahead and switch .jinit to .jpackage so the package is ready for more rJava functions. And here are some notes for anyone who wants to do more like this--

The code run by each pdftk command is in java/pdftk-master/java/com/gitlab/pdftk_java/TK_Session.java under TK_Session.create_output(). The definitions of the "report" functions for printing output files are at java/pdftk-master/java/com/gitlab/pdftk_java/report.java.

.jcall requires JNI signatures, which can be found with javap. For example, to get JNI signatures for staplr pdftk classes, run something like (from the source code directory)

javap -s -classpath inst/pdftk-java/pdftk.jar pdftk/com/lowagie/text/pdf/PdfReader

or

javap -s -classpath inst/pdftk-java/pdftk.jar com/gitlab/pdftk_java/report

where the classfile option is defined relative to java/pdftk-master/java.I thought those were the more confusing parts.

wmay avatar Nov 07 '21 05:11 wmay