zamba icon indicating copy to clipboard operation
zamba copied to clipboard

Before training, check user-provided train_labels to make sure it contains a validation split

Open AllenDowney opened this issue 2 years ago • 2 comments

Currently, if the user does not define splits, we generate random splits and check them.

But if the use provides the splits, we don't do any checking. If the user-provided split doesn't have any videos in the validation split, they'll get an error the first time the validation metric is computed. For example:

RuntimeError: Early stopping conditioned on metric `val_macro_f1` which is not available. Pass in or modify your `EarlyStopping` callback to use any of the following: `train_loss`

It would be nice to generate a more helpful message before training starts.

AllenDowney avatar Aug 18 '22 20:08 AllenDowney

This check may be as simple as just verifying that there is at least one train video and at least one val video. We will need to test if this is sufficient if there are species in train that are not in val, and if there is no holdout set specified.

ejm714 avatar Aug 18 '22 20:08 ejm714