meka icon indicating copy to clipboard operation
meka copied to clipboard

Prediction speed scales with training data size rather than output size

Open davidfstein opened this issue 1 year ago • 3 comments

I am running some experiments with the Mulan wrapper. Particularly, I added the COCOA method from that repository and am running the following for training:

java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -split-percentage 100 -t "train.arff" -d "clf.dmp" -W weka.classifiers.trees.J48 and for inference: java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -t "train.arff" -T "test.arff" -l "clf.dmp" -W weka.classifiers.trees.J48

Notably, training time increases moderately but reasonably as "train.arff" grows. However, with a fixed "test.arff" size, inference time scales exponentially with "train.arff" size. It seems almost as if training is not actually occurring during the first command but rather in the second. My java is very rusty so perhaps that is indeed what is happening. Is this the expected behavior?

davidfstein avatar Jun 14 '23 14:06 davidfstein

I just submitted a fix (https://github.com/Waikato/meka/commit/0608eeffd56cbb109902719515b632559e21a6c7), that will allow you to evaluate a previously trained model on a test set. This wasn't possible before, the model always got retrained with the training data.

With the latest snapshot, you would use something like this:

java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -threshold 1 -T "test.arff" -l "clf.dmp"

fracpete avatar Jun 14 '23 23:06 fracpete

Thanks for the quick fix!

I rebuilt from master, but I'm running into this error now:

java.lang.ArrayIndexOutOfBoundsException: Index 1341 out of bounds for length 1341 at weka.core.DenseInstance.value(DenseInstance.java:347) at mulan.transformations.BinaryRelevanceTransformation.transformInstance(BinaryRelevanceTransformation.java:126) at mulan.classifier.transformation.BinaryRelevance.makePredictionInternal(BinaryRelevance.java:83) at mulan.classifier.MultiLabelLearnerBase.makePrediction(MultiLabelLearnerBase.java:113) at mulan.classifier.transformation.COCOA.makePredictionforThreshold(COCOA.java:305) at mulan.classifier.transformation.COCOA.makePredictionInternal(COCOA.java:324) at mulan.classifier.MultiLabelLearnerBase.makePrediction(MultiLabelLearnerBase.java:113) at meka.classifiers.multilabel.MULAN.distributionForInstance(MULAN.java:263) at meka.classifiers.multilabel.Evaluation.testClassifier(Evaluation.java:617) at meka.classifiers.multilabel.Evaluation.evaluateModel(Evaluation.java:419) at meka.classifiers.multilabel.Evaluation.runExperiment(Evaluation.java:301) at meka.classifiers.multilabel.ProblemTransformationMethod.runClassifier(ProblemTransformationMethod.java:172) at meka.classifiers.multilabel.ProblemTransformationMethod.evaluation(ProblemTransformationMethod.java:152) at meka.classifiers.multilabel.MULAN.main(MULAN.java:273)

davidfstein avatar Jun 15 '23 15:06 davidfstein

Please provide a minimal example that replicates this problem.

fracpete avatar Jun 15 '23 20:06 fracpete