moabb icon indicating copy to clipboard operation
moabb copied to clipboard

DemonsP300 has only targets in the first third of data

Open jsosulski opened this issue 4 years ago • 6 comments
trafficstars

See this plot for one subject, but all subjects look the same:

X-Axis is which epoch is plotted and y is the label the epoch has

grafik

jsosulski avatar Jul 09 '21 06:07 jsosulski

Thanks! @v-goncharenko do you know if there is a data loading problem?

sylvchev avatar Jul 10 '21 07:07 sylvchev

Rest of the ground true labels are in .csv file. Look at the code, they get read there.

Also it's a good idea not to read raw unstructured data, but use final class (which is abstraction between internal format and common one)

v-goncharenko avatar Jul 15 '21 08:07 v-goncharenko

So is this an issue with the dataset implementation?

See MWE using current moabb version on pypi:

from matplotlib import pyplot as plt
from moabb.datasets import DemonsP300
from moabb.paradigms import P300

paradigm = P300()

dset = DemonsP300()
subject = 0

X, label, meta = paradigm.get_data(dset, [subject])

plt.plot(label)
plt.show()

This produces the plot in the first post and this uses the default moabb way of loading data

jsosulski avatar Jul 15 '21 08:07 jsosulski

I noticed in more and more literature that MOABB is being referenced (yay!), although most authors just use it for dataset acquisition, which is still a win I guess, until we have a centralized classification running system. However, should we start to tag datasets that have, e.g., been vetted by us that they work correctly? Then new MOABB users could use it as intended as a fire&forget way.

See e.g. this issue or the fixed #96 . As a new user who just wants to check out their classifiers performance on X, y data, they probably do not want to dig deep into the underlying datasets and check if everything is doing what it should. Currently on the documentation there is no hint that there are currently issues with this dataset.

I could offer to clean up the sanity check script (#184), commit it to moabb, and run it locally for all avilable P300 datasets, as I am most experienced with ERP data.

jsosulski avatar Dec 01 '21 08:12 jsosulski

Good for the citations ;) I tried to add paper in found referencing MOABB on this wiki page, it could be useful soon. Feel free to add some papers if you have the time.

I agree that the dataset should be verified and this issue is open for quite some time. I'm trying to improve the documentation by adding more information on the dataset. As a groundtruth, I update a wiki page with metadata regarding the datasets that are useful for ML. As you suggest, we could a minima include references to issues that are open for each dataset.

Best would be to ensure that all dataset are ok before adding them and you sanity check script could really help. It could be part of the required steps asked to comply with before adding a dataset. If you could run it on P300 it is nice. Someone could help for checking existing MOABB dataset in MI and SSVEP? @Div12345 @ErikBjare @v-goncharenko (I could help)

sylvchev avatar Dec 03 '21 08:12 sylvchev

This issue is stalling and users could use DemonsP300 without knowing that there is an issue. We could add a warning when the dataset is loaded, that make a reference to this issue, and we could update the documentation as well. If the problem of this dataset could not be fixed, we may have to deprecate it.

sylvchev avatar Mar 02 '22 10:03 sylvchev