aeon icon indicating copy to clipboard operation
aeon copied to clipboard

[ENH] loaders for MONSTER datasets

Open baraline opened this issue 4 months ago • 0 comments

Describe the feature or idea you want to propose

Include loaders to the MONSTER datasets in the datasets module.

The only downside is that we would have to put huggingface as an optional dependency. I'm not sure if there is other channels to load the datasets from to avoid another dependency?

Describe your proposed solution

Code to load the datasets from huggingface

import numpy as np
from aeon.utils.numba.general import z_normalise_series_3d
from huggingface_hub import hf_hub_download

univariate_monster_datasets = [
    "CornellWhaleChallenge",
    "AudioMNIST",
    "WhaleSounds",
    "Pedestrian",
    "FruitFlies",
    "AudioMNIST-DS",
    "Traffic",
    "LakeIce",
    "MosquitoSound",
    "InsectSound",
]


def load_monster(dataset_name, fold, normalize=True):
    repo_id = f"monster-monash/{dataset_name}"

    # Download data
    data_path = hf_hub_download(
        repo_id=repo_id, filename=f"{dataset_name}_X.npy", repo_type="dataset"
    )
    X = np.load(data_path, mmap_mode="r")  # (#Samples, #Channel, #Length)
    if normalize:
        X = z_normalise_series_3d(X)
    # Download labels
    label_filename = f"{dataset_name}_Y.npy"
    try:
        label_path = hf_hub_download(
            repo_id=repo_id, filename=label_filename, repo_type="dataset"
        )
    except:
        label_filename = f"{dataset_name}_y.npy"
        label_path = hf_hub_download(
            repo_id=repo_id, filename=label_filename, repo_type="dataset"
        )
    y = np.load(label_path)
    # Load test indices
    try:
        test_index_path = hf_hub_download(
            repo_id=repo_id,
            filename=f"test_indices_fold_{fold}.txt",
            repo_type="dataset",
        )
        test_index = np.loadtxt(test_index_path, dtype=int)
    except Exception as e:
        logger.error(f"Failed to load test indices: {e}")
        raise

    test_bool_index = np.zeros(len(y), dtype=bool)
    test_bool_index[test_index] = True
    return (
        X[~test_bool_index],
        y[~test_bool_index],
        X[test_bool_index],
        y[test_bool_index],
    )

Describe alternatives you've considered, if relevant

No response

Additional context

No response

baraline avatar Jul 31 '25 12:07 baraline