JFastText
JFastText copied to clipboard
SIGSEGV on getWords after training
Using the example training data (and preprocessing it using the classification-example.sh script that comes with fasttext), I get a SIGSEGV when calling getWords after training.
Training: ft.runCmd("supervised -input dbpedia.train -output model.bin -dim 100 -lr 0.05 -wordNgrams 2 -minCount 5 -bucket 2000000 -epoch 5".split(" "))
model.bin is successfully generated; and if I load it instead of training, there is no crash. I suspect it's running out of memory; but calling unloadModel before getWords does not help. I tried discarding the trained JFastText object and then running loadModel, but it seems model.bin is generated asynchronously so there is no good way to know when to call loadModel.
Crash log: hs_err_pid28676.txt
EDIT: version 0.3 on Mac OSX
Hi siegebell, I tried your command but couldn't reproduce the issue.
Based on the training command line, the output model file should be "model.bin.bin", not "model.bin" (fastText automatically appends the .bin suffix to the output model file). Could you check if you loaded the correct model file?
@vinhkhuc the training command I gave above was in error; it should be:
ft.runCmd("supervised -input dbpedia.train -output dbpedia -minCount 5 -wordNgrams 2 -bucket 2000000 -lr, 0.05 -dim 100 -epoch 5 -thread 8".split(" "))
I've tried deleting and regenerating the normalized training data and the model, but the problem persists. Are you able to test this on OS X and JDK 1.8 and still cannot reproduce?
system info: macOS Sierra; version 10.12.4; 16 GB memory
@siegebell Yes, I'm using Sierra and Java 8. The following code which calls getWords() works fine for me.
import com.github.jfasttext.JFastText;
public class DebugIssue {
public static void main(String[] args) {
JFastText jft = new JFastText();
jft.runCmd(("supervised " +
"-input ../cmd/data/dbpedia.train " +
"-output dbpedia " +
"-minCount 5 " +
"-wordNgrams 2 " +
"-bucket 2000000 " +
"-lr 0.05 " +
"-dim 100 " +
"-epoch 5 " +
"-thread 8").split(" "));
jft.loadModel("dbpedia.bin");
System.out.println(jft.getWords());
}
}
I got SIGSEGV if I commented out the line jft.loadModel("dbpedia.bin");. That's expected since in that case the model is not loaded, hence Exception.