liac-arff icon indicating copy to clipboard operation
liac-arff copied to clipboard

Bad @DATA instance format, cannot handle data instance end with comma

Open Anjin-Liu opened this issue 5 years ago • 2 comments

Hi, Happy new year!

I recently used weka to generate some .arff files. The files looks like

@relation 'SEA'

@attribute attrib1 numeric @attribute attrib2 numeric @attribute attrib3 numeric @attribute class {groupA,groupB}

@data

7.30967787376657,2.4053641567148585,6.374174253501082,groupB, 1.1700660880722513,7.815346320453048,2.5277616657598587,groupB, 9.84841540199809,8.791825178724801,9.412491794821143,groupB, 3.1293596519376554,3.6797575871052812,7.051747444754559,groupA,

which has a comma at the end of each row. These files can be read by weka correctly, but cannot be loaded by liac-arff. liac-arff will report "Bad @DATA instance format in line 10: 7.30967787376657,2.4053641567148585,6.374174253501082,groupB,"

after removing the comma, it works fine.

So, I think this might be an inconsistency with weka and submit this issue.

Anjin-Liu avatar Jan 06 '20 01:01 Anjin-Liu

Can you give more information on how you generated this? This appears to contradict the specs

jnothman avatar Jan 07 '20 12:01 jnothman

Hi jnothman,

Sorry for the late reply. Actually I used the MOA machine learning for stream software (https://moa.cms.waikato.ac.nz/) to generate the arff files.

I used the SEAGenerator SEAGenerator seaG1 = new SEAGenerator(); seaG1.nextInstance().getData().toString();

The comma at the end of instance can be easily removed by modifying the generated instance string. The main concern is that Weka can load arff files with comma at the end of each instance, but liac-arff cannot. This is not a big issue. But I think maybe liac-arff should be able to load such arff files as the same as Weka.

Best,

Anjin-Liu avatar Jan 09 '20 03:01 Anjin-Liu