measureByteSize() gets called twice in EvaluatePrequential for every stats collection cycle
measureByteSize() gets called twice in EvaluatePrequential for every stats collection cycle:
double RAMHoursIncrement = learner.measureByteSize() / (1024.0 * 1024.0 * 1024.0); //GBs
RAMHoursIncrement *= (timeIncrement / 3600.0); //Hours
RAMHours += RAMHoursIncrement;
lastEvaluateStartTime = evaluateTime;
learningCurve.insertEntry(new LearningEvaluation(
especially at:
-
double RAMHoursIncrement = learner.measureByteSize() / (1024.0 * 1024.0 * 1024.0); //GBs -
and at
learningCurve.insertEntry(new LearningEvaluation(by LearningEvaluation()'s model.getModelMeasurements()
This could result in high computing overhead on periodic stats collection for ensemble methods like SRP, and ARF with large number of base learners (100).
Simple test with default SRP parameters and default stream:
moa.DoTask "EvaluatePrequential -l meta.StreamingRandomPatches -i 100000 -f 10000 -q 10000"
-
MOA master 6eacf9b
Task completed in 6m24s (CPU time) -
Time after commenting the first occurrence:
diff --git a/moa/src/main/java/moa/tasks/EvaluatePrequential.java b/moa/src/main/java/moa/tasks/EvaluatePrequential.java
index 8003489..16b51c8 100644
--- a/moa/src/main/java/moa/tasks/EvaluatePrequential.java
+++ b/moa/src/main/java/moa/tasks/EvaluatePrequential.java
@@ -213,7 +213,7 @@ public class EvaluatePrequential extends ClassificationMainTask implements Capab
long evaluateTime = TimingUtils.getNanoCPUTimeOfCurrentThread();
double time = TimingUtils.nanoTimeToSeconds(evaluateTime - evaluateStartTime);
double timeIncrement = TimingUtils.nanoTimeToSeconds(evaluateTime - lastEvaluateStartTime);
- double RAMHoursIncrement = learner.measureByteSize() / (1024.0 * 1024.0 * 1024.0); //GBs
+ double RAMHoursIncrement = 0.0 / (1024.0 * 1024.0 * 1024.0); //GBs
RAMHoursIncrement *= (timeIncrement / 3600.0); //Hours
RAMHours += RAMHoursIncrement;
lastEvaluateStartTime = evaluateTime;
Task completed in 5m7s (CPU time)
- Time after commenting both the occurrences:
diff --git a/moa/src/main/java/moa/classifiers/AbstractClassifier.java b/moa/src/main/java/moa/classifiers/AbstractClassifier.java
index f60467d..30636a2 100644
--- a/moa/src/main/java/moa/classifiers/AbstractClassifier.java
+++ b/moa/src/main/java/moa/classifiers/AbstractClassifier.java
@@ -185,7 +185,7 @@ public abstract class AbstractClassifier extends AbstractOptionHandler
measurementList.add(new Measurement("model training instances",
trainingWeightSeenByModel()));
measurementList.add(new Measurement("model serialized size (bytes)",
- measureByteSize()));
+ 0.0));
Measurement[] modelMeasurements = getModelMeasurementsImpl();
if (modelMeasurements != null) {
measurementList.addAll(Arrays.asList(modelMeasurements));
diff --git a/moa/src/main/java/moa/tasks/EvaluatePrequential.java b/moa/src/main/java/moa/tasks/EvaluatePrequential.java
index 8003489..16b51c8 100644
--- a/moa/src/main/java/moa/tasks/EvaluatePrequential.java
+++ b/moa/src/main/java/moa/tasks/EvaluatePrequential.java
@@ -213,7 +213,7 @@ public class EvaluatePrequential extends ClassificationMainTask implements Capab
long evaluateTime = TimingUtils.getNanoCPUTimeOfCurrentThread();
double time = TimingUtils.nanoTimeToSeconds(evaluateTime - evaluateStartTime);
double timeIncrement = TimingUtils.nanoTimeToSeconds(evaluateTime - lastEvaluateStartTime);
- double RAMHoursIncrement = learner.measureByteSize() / (1024.0 * 1024.0 * 1024.0); //GBs
+ double RAMHoursIncrement = 0.0 / (1024.0 * 1024.0 * 1024.0); //GBs
RAMHoursIncrement *= (timeIncrement / 3600.0); //Hours
RAMHours += RAMHoursIncrement;
lastEvaluateStartTime = evaluateTime;
Task completed in 3m45s (CPU time)
We could pass the already calculated byte size to getModelMeasurementsImpl()
Same happens with EvaluateInterleavedTestThenTrain as well
How to run the tests test.txt