ml_drought
ml_drought copied to clipboard
Init/runoff
NOTE:
- EarlyStopping is currently not working because I haven't created a train/validation test set
Create xy samples dynamically from Data loaded into memory
sorry this is a huge PR where we have basically re-written the Engineer/DataLoaders/Models to work with data loaded into memory. Better for hard disk constrained
modelling problems where the size of the seq_length
is larger (e.g. 365 daily timesteps as input to the LSTM models).
Use the Pipeline for working with runoff data.
- data is 2D instead of 3D (
station_id, time
) - data is on smaller timesteps than monthly (
daily
) - create dynamic engineer
- create dynamic dataloader
- update the EALSTM / Neural Networks to work with DynamicDataLoaders
- new arguments to models =
'seq_length', 'target_var', 'forecast_horizon'
We have created an experiment file for running the OneTimestepForecast Runoff modelling: scripts/experiments/18_runoff_init.py
Analysis updates
We have added some updates to the analysis code:
- overview: update all rmse/r2 functions to calculate spatial scores (score for each spatial unit) and temporal scores (time series of each station)
- add more catching of the inversion problem (turns out it occurs when the order of
lat, lon
is reversed ->lon, lat
Engineer updates
- Create new engineer
OneTimestepForecast
-src/engineer/one_timestep_forecast.py
- Created a new DynamicEngineer for use with the DynamicDataLoader NOTE do we want this or do we ideally want to generalise the one_month_forecast?
- Major difference is collapsing things not by
lat, lon
but bydimension_name = [c for c in static_ds.coords][0]
DataLoader Updates
-
self.get_reducing_dims
to get the spatial dimensions (either latlon or area or station_id or whatever is not time!) - aggregations collapse over these reducing dimensions
global_mean = x.mean(dim=reducing_dims)
-
build_loc_to_idx_mapping
building a dictionary to ensure we can track what id relates to what spatial unit - Various examples of
if len(static_np.shape) == 3:
having to account for 2D spatial information (time, lat, lon) or 1D spatial information (time, station_id)
TODO:
# TODO: why so many static nones?
- This is because the standard deviation of some of the values, stored in the normalizing_dict become 0, so dividing by 0 we get np.nan
Model updates
-
seq_length
//include_timestep_aggs
- use a dataloader for the load in timesteps
for x, y in tqdm.tqdm(train_dataloader):
-
include_monthly_aggs
->include_timestep_aggs
= spatial aggregation (map of mean values for that pixel)