vak icon indicating copy to clipboard operation
vak copied to clipboard

BUG: SpectStandardizer fit in train does not specify a split

Open NickleDave opened this issue 3 years ago • 0 comments

Describe the bug We pass the whole dataset in when we fit the standardizer. https://github.com/vocalpy/vak/blob/7ba22612d876f0a74fed4d930be77a1edf2f6900/src/vak/core/train.py#L251

       spect_standardizer = transforms.StandardizeSpect.fit_df(
            dataset_df, spect_key=spect_key
        )

Not a bug that causes a crash per se but I think technically not correct.
We don't want a model that already knows about the distribution of data in val + test sets. In practice stats of each split are close enough it probably doesn't make much of a difference, I would guess.

Expected behavior We should either filter in place df=dataset_df[datset_df.split == 'train']] or put the train split in a separate df already and then pass that in when we fit the standardizer.

NickleDave avatar Sep 05 '22 00:09 NickleDave