ml_drought icon indicating copy to clipboard operation
ml_drought copied to clipboard

Init/runoff

Open tommylees112 opened this issue 4 years ago • 0 comments

NOTE:

  • EarlyStopping is currently not working because I haven't created a train/validation test set

Create xy samples dynamically from Data loaded into memory

sorry this is a huge PR where we have basically re-written the Engineer/DataLoaders/Models to work with data loaded into memory. Better for hard disk constrained modelling problems where the size of the seq_length is larger (e.g. 365 daily timesteps as input to the LSTM models).

Use the Pipeline for working with runoff data.

  • data is 2D instead of 3D (station_id, time)
  • data is on smaller timesteps than monthly (daily)
  • create dynamic engineer
  • create dynamic dataloader
  • update the EALSTM / Neural Networks to work with DynamicDataLoaders
  • new arguments to models = 'seq_length', 'target_var', 'forecast_horizon'

We have created an experiment file for running the OneTimestepForecast Runoff modelling: scripts/experiments/18_runoff_init.py

Analysis updates

We have added some updates to the analysis code:

  • overview: update all rmse/r2 functions to calculate spatial scores (score for each spatial unit) and temporal scores (time series of each station)
  • add more catching of the inversion problem (turns out it occurs when the order of lat, lon is reversed -> lon, lat

Engineer updates

  • Create new engineer OneTimestepForecast - src/engineer/one_timestep_forecast.py
  • Created a new DynamicEngineer for use with the DynamicDataLoader NOTE do we want this or do we ideally want to generalise the one_month_forecast?
  • Major difference is collapsing things not by lat, lon but by dimension_name = [c for c in static_ds.coords][0]

DataLoader Updates

  • self.get_reducing_dims to get the spatial dimensions (either latlon or area or station_id or whatever is not time!)
  • aggregations collapse over these reducing dimensions global_mean = x.mean(dim=reducing_dims)
  • build_loc_to_idx_mapping building a dictionary to ensure we can track what id relates to what spatial unit
  • Various examples of if len(static_np.shape) == 3: having to account for 2D spatial information (time, lat, lon) or 1D spatial information (time, station_id)

TODO: # TODO: why so many static nones?

  • This is because the standard deviation of some of the values, stored in the normalizing_dict become 0, so dividing by 0 we get np.nan

Model updates

  • seq_length // include_timestep_aggs
  • use a dataloader for the load in timesteps for x, y in tqdm.tqdm(train_dataloader):
  • include_monthly_aggs -> include_timestep_aggs = spatial aggregation (map of mean values for that pixel)

tommylees112 avatar Mar 04 '20 11:03 tommylees112