hydromt icon indicating copy to clipboard operation
hydromt copied to clipboard

Tracking issue for integrating and harmonising Model and components

Open savente93 opened this issue 4 months ago • 5 comments

Kind of request

Changing existing functionality

Enhancement Description

Requirements:

  • Model needs to be able to add components etc.
  • #813 should be solved to be able to create grid components
  • #821 so we can harmonize the components
  • CLI/yaml users need a way to list the needed components
  • Should user be able to determine model/target CRS at init? (#827 )
  • What are the abstract methods of ModelComponent ?

Use case

No response

Additional Context

- init_model
	components:
		- grid
			kind: GridComponent
		- subgrid:
			kind: GridComponent
		- config:
			kind: ConfigComponent
		- geoms:
			kind: RegionComponent

savente93 avatar Mar 08 '24 08:03 savente93

After a discussion with @deltamarnix @Tjalling-dejong and @savente93 we came up with the following proposal for how the yaml should look going forward (names could still be changed if necessary):

components:
    - grid: GridComponent
    - sub_grid: GridComponent
    - region: RegionComponent
    - ....
 steps:
    - create:
        - name: grid
        - kind: GridComponent
        - [further kwargs] 
    - create:
        - name: sub_grid
        - kind: GridComponent
        - [further kwargs] 
    # - create:
        # - name: subb_grid <- would error because of incorrectly spelled name
        # - kind: GridComponent
        # - [further kwargs] 
    ....  
    - write

This would mean that the names of the component and their types/kinds have to be declared up front in a model config. This has the benefit that we can do type checking (are functions available for the component they are called for), provide feedback if users mistype component names, and being a clear overview of which parts are included in a model for someone looking at a new project.

This is also where we could check several things about the names of components including:

  • they are unique
  • they are valid attribute names (so that if a user specifies we have a grid component they can access it as model.grid
  • all components referenced in the steps exist

Given that components are so interdependent (e.g. each component should read the model mode from the model) it seems pretty clear that components should not be used outside of the model context. To facilitate this, as well as the very diverse arguments needed to initialise them, the initial list of components will just add empty components to the model. These components should then be populated with data by function calls in the steps part. This means that the model would be responsible for creating and registering the created components. The upside of this is that we can do a lot of the leg work such as providing the components with the right references and type checking for the user.

savente93 avatar Mar 11 '24 14:03 savente93

We've been playing around, also with the write step and found that it would probably be easier to stick closer to the Github actions syntax. Where with will be the kwargs that go to the specific step. on will be the component to call the function on, and step is the name of the method to call.

components:
    - grid: GridComponent
    - subgrid: GridComponent
steps:
    - step: create
      on: grid
      with:
        shape: [10, 10]
        dtype: float32
        fill: 0.0
    - step: create
      on: subgrid
      with:
        shape: [10, 10]
        dtype: float32
        fill: 0.0

deltamarnix avatar Mar 11 '24 16:03 deltamarnix

We did some further refining still and so far have landed on this:

components:
    grid:
        type: GridComponent
        filename: grid.nc
    subgrid:
        type: GridComponent
        filename: subgrid.nc
steps:
    - step: create
      on: grid
      with:
        shape: [10, 10]
        dtype: float32
        fill: 0.0
    - step: create
      on: subgrid
      with:
        shape: [10, 10]
        dtype: float32
        fill: 0.0

This makes it very clear what things are for hydromt and which are for the underlying functions. While I'm fine with changing the names if others think something else will be clearer, but I do think the approach is a good one. The one thing we still have to discuss is how we determine which components to write at the end. Do we want to maintain the current implementation of writing everything unless a user specifies which to write? if so, how do we determine how to write them? (e.g. if the user doesn't specify anything, how do we determine what the kwargs et. al. should be)

savente93 avatar Mar 11 '24 17:03 savente93

Hi all, nice thinking!

And indeed it makes sense to initialize the (empty) components at the beginning and then populate them through Model. I also like to add a general steps and list all the model methods to call. It avoids duplicated name and I think it is also clearer that methods are executed in the order the user lists them in the config!

A few extra comments:

  • model init: at the beginning, instead of just listing components, the user should just be able to change any model argument init like for example data_libs or others. So I would suggest something more general so that the model can just be fully initialized and not just the components. So far we are using the keyword global for this but not sure if it is the best name.
  • component init: also good if the components and init attributes can be specified from config. Filename can be useful to change (so far only the config filename was supported in model init with config_fn argument). So I like the way this is implemented now!
  • model steps: most methods in plugins can change several components so for example very few WflowModel methods will use the on argument so I wonder if we do need it (example below with setup_reservoirs that updates grid. geoms and config). What is the disadvantage of calling just grid.create and simplifying the yaml file?
  • write: so in the current implementation if the user does not specify any kwargs we just write the whole model files using defaults. Plugins make sure the default work for their own model. The thing we should encourage is that the write method of core just has a components argument which list the components to write with default argument. So if you want to change the defaults for one, you can just take this one out of the general write and then call its specific method.
  • write behavior: if we are going to have steps now, I wonder if we do not encourage the user to always add write in the steps to write the model. It breaks current behavior but it would maybe make it clearer.

So to recap, here is what the yaml file could look like:

global:
  data_libs: deltares_data
  components:
    grid:
      type: GridComponent
    geoms:
      type: GeomsComponent
    config:
      type: ConfigComponent
      filename: wflow_sbm_calibrated.toml

steps:
  - grid.create:
        shape: [10, 10]
        dtype: float32
        fill: 0.0
  - setup_reservoirs:
       reservoirs_fn: hydro_reservoirs
       min_area: 1.0
  - write:
      components: 
        - grid
        - config
  - geoms.write:
      filename: geoms/*.gpkg
      driver: GPKG

Well while writing this example, I think the component init actually does not look so good. I kind of hope that most users by using a plugin will not have to initialize components themselves as this is in my view quite complex. But right now, in order to change the default filename of one component, all components need to be listed still... Why we only supported changing the name of config in config_fn is actually because for most models, the names of other components are either fixed or derived from the config file. So maybe we could keep the current behavior and only allow to change the config filename in the Model init base argument. Anyhow components don't really have init attributes apart from the model weakref so we could also think of simplifying by just giving the components list as in the first example and keep the current workarounds. E.g if the grid filename was not in the config, you could (fictional example):

global:
  config_fn: wflow_sbm_calibrated.toml

steps:
  - grid.read:
      filename: staticmaps_calibrated.nc
  - setup_reservoirs:
      reservoirs_fn: hydro_lakes
      min_area: 1.0
  - write:
      components:
        - config
        - geoms
 - grid.write:
     filename: staticmaps_calibrated_reservoirs.nc

This is potentially a significant change for the user so I wonder if we should not maybe open a discussion, suggest two or three options with before/after and plugin/no plugin and post in the hydromt chat to get votes and feedback?

hboisgon avatar Mar 12 '24 02:03 hboisgon

Another refinement with @deltamarnix @hboisgon @savente93 yielded the following proposal:

model_type: wflow # <- entrypoint
global:
  data_libs: deltares_data
  components:
    config:
      filename: wflow_sbm_calibrated.toml

steps:
  - grid.create:
        shape: [10, 10]
        dtype: float32
        fill: 0.0
  - setup_reservoirs:
       reservoirs_fn: hydro_reservoirs
       min_area: 1.0
  - write:
      components: 
        - grid
        - config
  - geoms.write:
      filename: geoms/*.gpkg
      driver: GPKG

# implicit write at end if no write was specified.

Yet to be determined is the syntax of which functions go where (functions on components vs functions on the model) and whether we can move everything to components or not. We decided to go for this version for the alpha as it is the least amount of changes on both our side and the plugins and then as part of the plugin feedback we can refine what is feasible.

savente93 avatar Mar 12 '24 09:03 savente93