hydromt
hydromt copied to clipboard
Tracking issue for integrating and harmonising Model and components
Kind of request
Changing existing functionality
Enhancement Description
Requirements:
- Model needs to be able to add components etc.
- #813 should be solved to be able to create grid components
- #821 so we can harmonize the components
- CLI/yaml users need a way to list the needed components
- Should user be able to determine model/target CRS at init? (#827 )
- What are the abstract methods of
ModelComponent
?
Use case
No response
Additional Context
- init_model
components:
- grid
kind: GridComponent
- subgrid:
kind: GridComponent
- config:
kind: ConfigComponent
- geoms:
kind: RegionComponent
After a discussion with @deltamarnix @Tjalling-dejong and @savente93 we came up with the following proposal for how the yaml should look going forward (names could still be changed if necessary):
components:
- grid: GridComponent
- sub_grid: GridComponent
- region: RegionComponent
- ....
steps:
- create:
- name: grid
- kind: GridComponent
- [further kwargs]
- create:
- name: sub_grid
- kind: GridComponent
- [further kwargs]
# - create:
# - name: subb_grid <- would error because of incorrectly spelled name
# - kind: GridComponent
# - [further kwargs]
....
- write
This would mean that the names of the component and their types/kinds have to be declared up front in a model config. This has the benefit that we can do type checking (are functions available for the component they are called for), provide feedback if users mistype component names, and being a clear overview of which parts are included in a model for someone looking at a new project.
This is also where we could check several things about the names of components including:
- they are unique
- they are valid attribute names (so that if a user specifies we have a
grid
component they can access it asmodel.grid
- all components referenced in the steps exist
Given that components are so interdependent (e.g. each component should read the model mode from the model) it seems pretty clear that components should not be used outside of the model context. To facilitate this, as well as the very diverse arguments needed to initialise them, the initial list of components will just add empty components to the model. These components should then be populated with data by function calls in the steps part. This means that the model would be responsible for creating and registering the created components. The upside of this is that we can do a lot of the leg work such as providing the components with the right references and type checking for the user.
We've been playing around, also with the write
step and found that it would probably be easier to stick closer to the Github actions syntax. Where with
will be the kwargs
that go to the specific step. on
will be the component to call the function on, and step
is the name of the method to call.
components:
- grid: GridComponent
- subgrid: GridComponent
steps:
- step: create
on: grid
with:
shape: [10, 10]
dtype: float32
fill: 0.0
- step: create
on: subgrid
with:
shape: [10, 10]
dtype: float32
fill: 0.0
We did some further refining still and so far have landed on this:
components:
grid:
type: GridComponent
filename: grid.nc
subgrid:
type: GridComponent
filename: subgrid.nc
steps:
- step: create
on: grid
with:
shape: [10, 10]
dtype: float32
fill: 0.0
- step: create
on: subgrid
with:
shape: [10, 10]
dtype: float32
fill: 0.0
This makes it very clear what things are for hydromt and which are for the underlying functions. While I'm fine with changing the names if others think something else will be clearer, but I do think the approach is a good one. The one thing we still have to discuss is how we determine which components to write at the end. Do we want to maintain the current implementation of writing everything unless a user specifies which to write? if so, how do we determine how to write them? (e.g. if the user doesn't specify anything, how do we determine what the kwargs et. al. should be)
Hi all, nice thinking!
And indeed it makes sense to initialize the (empty) components at the beginning and then populate them through Model.
I also like to add a general steps
and list all the model methods to call. It avoids duplicated name and I think it is also clearer that methods are executed in the order the user lists them in the config!
A few extra comments:
-
model init: at the beginning, instead of just listing components, the user should just be able to change any model argument init like for example data_libs or others. So I would suggest something more general so that the model can just be fully initialized and not just the components. So far we are using the keyword
global
for this but not sure if it is the best name. -
component init: also good if the components and init attributes can be specified from config. Filename can be useful to change (so far only the config filename was supported in model init with
config_fn
argument). So I like the way this is implemented now! -
model steps: most methods in plugins can change several components so for example very few
WflowModel
methods will use theon
argument so I wonder if we do need it (example below with setup_reservoirs that updates grid. geoms and config). What is the disadvantage of calling justgrid.create
and simplifying the yaml file? -
write: so in the current implementation if the user does not specify any kwargs we just write the whole model files using defaults. Plugins make sure the default work for their own model. The thing we should encourage is that the
write
method of core just has acomponents
argument which list the components to write with default argument. So if you want to change the defaults for one, you can just take this one out of the generalwrite
and then call its specific method. - write behavior: if we are going to have steps now, I wonder if we do not encourage the user to always add write in the steps to write the model. It breaks current behavior but it would maybe make it clearer.
So to recap, here is what the yaml file could look like:
global:
data_libs: deltares_data
components:
grid:
type: GridComponent
geoms:
type: GeomsComponent
config:
type: ConfigComponent
filename: wflow_sbm_calibrated.toml
steps:
- grid.create:
shape: [10, 10]
dtype: float32
fill: 0.0
- setup_reservoirs:
reservoirs_fn: hydro_reservoirs
min_area: 1.0
- write:
components:
- grid
- config
- geoms.write:
filename: geoms/*.gpkg
driver: GPKG
Well while writing this example, I think the component init actually does not look so good. I kind of hope that most users by using a plugin will not have to initialize components themselves as this is in my view quite complex. But right now, in order to change the default filename of one component, all components need to be listed still... Why we only supported changing the name of config in config_fn
is actually because for most models, the names of other components are either fixed or derived from the config file. So maybe we could keep the current behavior and only allow to change the config filename in the Model init base argument. Anyhow components don't really have init attributes apart from the model weakref so we could also think of simplifying by just giving the components list as in the first example and keep the current workarounds. E.g if the grid filename was not in the config, you could (fictional example):
global:
config_fn: wflow_sbm_calibrated.toml
steps:
- grid.read:
filename: staticmaps_calibrated.nc
- setup_reservoirs:
reservoirs_fn: hydro_lakes
min_area: 1.0
- write:
components:
- config
- geoms
- grid.write:
filename: staticmaps_calibrated_reservoirs.nc
This is potentially a significant change for the user so I wonder if we should not maybe open a discussion, suggest two or three options with before/after and plugin/no plugin and post in the hydromt chat to get votes and feedback?
Another refinement with @deltamarnix @hboisgon @savente93 yielded the following proposal:
model_type: wflow # <- entrypoint
global:
data_libs: deltares_data
components:
config:
filename: wflow_sbm_calibrated.toml
steps:
- grid.create:
shape: [10, 10]
dtype: float32
fill: 0.0
- setup_reservoirs:
reservoirs_fn: hydro_reservoirs
min_area: 1.0
- write:
components:
- grid
- config
- geoms.write:
filename: geoms/*.gpkg
driver: GPKG
# implicit write at end if no write was specified.
Yet to be determined is the syntax of which functions go where (functions on components vs functions on the model) and whether we can move everything to components or not. We decided to go for this version for the alpha as it is the least amount of changes on both our side and the plugins and then as part of the plugin feedback we can refine what is feasible.