meka
meka copied to clipboard
Prediction speed scales with training data size rather than output size
I am running some experiments with the Mulan wrapper. Particularly, I added the COCOA method from that repository and am running the following for training:
java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -split-percentage 100 -t "train.arff" -d "clf.dmp" -W weka.classifiers.trees.J48
and for inference:
java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -t "train.arff" -T "test.arff" -l "clf.dmp" -W weka.classifiers.trees.J48
Notably, training time increases moderately but reasonably as "train.arff" grows. However, with a fixed "test.arff" size, inference time scales exponentially with "train.arff" size. It seems almost as if training is not actually occurring during the first command but rather in the second. My java is very rusty so perhaps that is indeed what is happening. Is this the expected behavior?
I just submitted a fix (https://github.com/Waikato/meka/commit/0608eeffd56cbb109902719515b632559e21a6c7), that will allow you to evaluate a previously trained model on a test set. This wasn't possible before, the model always got retrained with the training data.
With the latest snapshot, you would use something like this:
java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -threshold 1 -T "test.arff" -l "clf.dmp"
Thanks for the quick fix!
I rebuilt from master, but I'm running into this error now:
java.lang.ArrayIndexOutOfBoundsException: Index 1341 out of bounds for length 1341 at weka.core.DenseInstance.value(DenseInstance.java:347) at mulan.transformations.BinaryRelevanceTransformation.transformInstance(BinaryRelevanceTransformation.java:126) at mulan.classifier.transformation.BinaryRelevance.makePredictionInternal(BinaryRelevance.java:83) at mulan.classifier.MultiLabelLearnerBase.makePrediction(MultiLabelLearnerBase.java:113) at mulan.classifier.transformation.COCOA.makePredictionforThreshold(COCOA.java:305) at mulan.classifier.transformation.COCOA.makePredictionInternal(COCOA.java:324) at mulan.classifier.MultiLabelLearnerBase.makePrediction(MultiLabelLearnerBase.java:113) at meka.classifiers.multilabel.MULAN.distributionForInstance(MULAN.java:263) at meka.classifiers.multilabel.Evaluation.testClassifier(Evaluation.java:617) at meka.classifiers.multilabel.Evaluation.evaluateModel(Evaluation.java:419) at meka.classifiers.multilabel.Evaluation.runExperiment(Evaluation.java:301) at meka.classifiers.multilabel.ProblemTransformationMethod.runClassifier(ProblemTransformationMethod.java:172) at meka.classifiers.multilabel.ProblemTransformationMethod.evaluation(ProblemTransformationMethod.java:152) at meka.classifiers.multilabel.MULAN.main(MULAN.java:273)
Please provide a minimal example that replicates this problem.