vak
vak copied to clipboard
BUG: SpectStandardizer fit in train does not specify a split
Describe the bug We pass the whole dataset in when we fit the standardizer. https://github.com/vocalpy/vak/blob/7ba22612d876f0a74fed4d930be77a1edf2f6900/src/vak/core/train.py#L251
spect_standardizer = transforms.StandardizeSpect.fit_df(
dataset_df, spect_key=spect_key
)
Not a bug that causes a crash per se but I think technically not correct.
We don't want a model that already knows about the distribution of data in val + test sets.
In practice stats of each split are close enough it probably doesn't make much of a difference, I would guess.
Expected behavior
We should either filter in place df=dataset_df[datset_df.split == 'train']] or put the train split in a separate df already and then pass that in when we fit the standardizer.