Senti4SD icon indicating copy to clipboard operation
Senti4SD copied to clipboard

Documentation for Senti4SD-fast.jar

Open maelick opened this issue 5 years ago • 0 comments

I'm trying to use Senti4SD on a large dataset (~100M lines of text) and would like to instrument most of it from R to improve performance. In particular, I'm trying to avoid the creation of the large CSV file containing the features.

For that, I want to run Senti4SD on chunks of the data. However, this considerably slows down the whole process because each time the script is called, Senti4SD-fast.jar needs to reload dsm.bin. To overcome that problem, I want to use rJava to load the JVM from R itself, load the dsm.bin and run the feature extraction on chunks without storing the result in a file.

Is there any documentation available that would allow me to easily call with rJava the feature extraction without creating files?

maelick avatar Apr 08 '19 08:04 maelick