ENH: add BaseVocalDataset that uses `vocles`
related to #446
we do already have a base VocalDataset but it's basically just used for prediction
there should be something like a base Dataset class similar to the hierarchy in torchvision that has an init that expect to get a path to a vocles dataset and then keeps that as an attribute
Two sub-classes would be AudioDataset and SpectrogramDataset, that each return as an __item__ the audio or spectrogram + any corresponding annotation from the row. We could just always return a dict with audio / spect and annot and let annot be None for unannotated data. This removes the need to have a separate dataset for prediction
Then e.g. a BFSongRepo dataset would sub-class the SpectrogramDataset?
But then we'd need to actually provide spectrograms :thinking: