moa
moa copied to clipboard
Generating Multi-Label Synthetic Data Stream gives a NullPointerException
Hey There,
The issue that I will talk about next is discussed here previously: https://groups.google.com/forum/#!topic/moa-development/ho-_Z22k1-E
The task WriteStreamToARFFFile does not work properly. Although some initial statistics on the distribution of the label sets are outputted to the terminal, the process terminates with a NullPointerException.
The error is replicated by some other user in MOA Development Google Group as well.
The error is similar to this:
Failure reason: Failed writing to file /home/****/Synth.arff *** STACK TRACE ***java.lang.RuntimeException: Failed writing to file /home/****/Synth.arff at moa.tasks.WriteStreamToARFFFile.doMainTask(WriteStreamToARFFFile.java:86) at moa.tasks.MainTask.doTaskImpl(MainTask.java:50) at moa.tasks.AbstractTask.doTask(AbstractTask.java:57) at moa.tasks.TaskThread.run(TaskThread.java:76) Caused by: java.lang.NullPointerException at com.yahoo.labs.samoa.instances.SparseInstanceData.locateIndex(SparseInstanceData.java:237) at com.yahoo.labs.samoa.instances.SparseInstanceData.setValue(SparseInstanceData.java:220) at com.yahoo.labs.samoa.instances.InstanceImpl.setValue(InstanceImpl.java:269) at moa.streams.generators.multilabel.MetaMultilabelGenerator.generateMLInstance(MetaMultilabelGenerator.java:274) at moa.streams.generators.multilabel.MetaMultilabelGenerator.nextInstance(MetaMultilabelGenerator.java:228) at moa.streams.generators.multilabel.MetaMultilabelGenerator.nextInstance(MetaMultilabelGenerator.java:46) at moa.tasks.WriteStreamToARFFFile.doMainTask(WriteStreamToARFFFile.java:80) ... 3 more
The setting which results in the error is as follows:
- Pick 'WriteStreamToARFFFile' task. As its options:
- stream: generators.multilabel.MetaMultilabelGenerator (with default values. I also tried to change some of the options there, such as NumLabels and LabelCardinality)
- arffFile: An empty file that I specified with proper read write permissions.
- maxInstances: 100,000. Or any other value
- taskResultFile: This is left blank, as it is for the results on the generated data (for most common labelset etc.)
Yes. Same happens to me whenever I run WriteStreamToArffFile with the MetaMultilabelGenerator.
I finally solved it by forcing the generator to use a dense representation instead of a sparse one. That is, in the method called "generateMLInstance" I changed the following line:
Instance x_ml = new SparseInstance(this.multilabelStreamTemplate.numAttributes());
with:
Instance x_ml = new DenseInstance(this.multilabelStreamTemplate.numAttributes());
And then it works. Don't forget to add the corresponding import.
I changed the line in the SparseInstance.java Line 49: super(1, null, null, (int) numberAttributes); into super((int) numberAttributes);
We are changing the type of constructor.
An example setting:
- Other task >
- moa.tasks.WriteStreamToARFFFile >
- maxInstance = 5000
- arffFile = %user_given_path
- stream = moa.streams.generators.multilabel.MetaMultiLabelGenerator >
- metaRandomSeed = 1
- numLabels = 25
- skew = 0
- labelCardinality = 2
- labelCardinalityVar = 1.5
- labelDependency = 0.25
- labelDependencyRatioChange = 0.2
- binaryGenerator = moa.streams.generators.TextGenerator >
- numAtts = 40,000
- instanceRandomSeed = 1
This issue still persists. I have tried every solution here with the given setup of @JayKumarr but Multi-Label stream generation still does not work.
As you can see in the screenshot, I have used @JayKumarr 's suggestion and his setup, but I couldn't solve my problem. Also I have saved this file into the JAR package so I'm sure this is the running code.
Edit: I have also tried @juancard's suggestion but it did not solve my problem neither.