nowcasting_dataset icon indicating copy to clipboard operation
nowcasting_dataset copied to clipboard

Run validation script at the end of `prepare_ml_data.py`?

Open JackKelly opened this issue 3 years ago • 2 comments

Detailed Description

Maybe we should always validated the on-disk batches?

(Let's wait for PR #300 to be merged before working on this)

JackKelly avatar Nov 01 '21 16:11 JackKelly

The validation script is 'abit' / 'a lot' out of data. Itll need some work to update. The good thing is the Batch validates each data source as we go.

Perhaps a easy cahnge to make, would be to validate the t0_datetimes are in sperate groups for the train, validation and test

peterdudfield avatar Nov 16 '21 17:11 peterdudfield

validate the t0_datetimes are in sperate groups for the train, validation and test

I completely agree! I think I implemented this here: https://github.com/openclimatefix/nowcasting_dataset/blob/main/nowcasting_dataset/dataset/split/split.py#L189

JackKelly avatar Nov 16 '21 19:11 JackKelly