pymovements icon indicating copy to clipboard operation
pymovements copied to clipboard

add support for event-only datasets

Open dkrako opened this issue 2 months ago • 3 comments

Description of the problem

There are a lot of datasets published that only include event data and no raw gaze samples.

It would be great if the dataset library could also support these.

Description of a solution

The DatasetDefinition should probably extended with a new attribute that signifies the type of data to expect.

My first proposal would be datatype: Literal['raw', 'events']. Let's brainstorm if we can find a better name.

Additionally, the Dataset.load() method must be changed accordingly:

https://github.com/aeye-lab/pymovements/blob/cb9ef9571c5b24f7609928d18efc3ae2520c1d03/src/pymovements/dataset/dataset.py#L77-L135

Currently, it always runs Dataset.load_gaze_files(), but this should then be dependent on DatasetDefinition.datatype. Also, events should be set to events: bool | None = None and later assume a boolean value dependent on DatasetDefinition.datatype

I propose the following signature:

    def load(
            self,
            *,
            gaze: bool | None = None,
            events: bool | None = None,
            preprocessed: bool = False,
            subset: dict[str, float | int | str | list[float | int | str]] | None = None,
            events_dirname: str | None = None,
            preprocessed_dirname: str | None = None,
            extension: str = 'feather',
    ) -> Dataset:

This would be backwards compatible:

if dataset.definition.datatype == 'raw' and gaze is None:
    gaze = True
else:
    gaze = False

if dataset.definition.datatype == 'events' and events is None:
    events = True
else:
    events = False

Minimum acceptance criteria

  • [ ] add argument to DatasetDefinition to indicate type of dataset
  • [ ] adjust Dataset.load() default values to decide loading gaze or events during runtime

dkrako avatar Apr 05 '24 12:04 dkrako