hydromt icon indicating copy to clipboard operation
hydromt copied to clipboard

Move region specific code from GridComponent to ModelRegionComponent

Open deltamarnix opened this issue 1 year ago • 14 comments

Kind of request

Changing existing functionality

Enhancement Description

GridComponent in the v1 branch contains quite some code that should actually be part of ModelRegionComponent. The code should be separated more so that it is easier to reuse.

  • region is still a default component in Model. So it will always be present.
  • region.create will only be called implicitly if a region is passed as an argument in the command line.
  • region.create is still a valid step in the configuration yml
  • grid.create has an optional argument region to initialize the referenced region. Which is by default region, but that can be altered via the global.components input in the yml.

Use case

GridComponent should be able to reference to different region objects. For example, when we have a subgrid with a subregion

model_type: Model
global:
  components:
    region:
      type: ModelRegionComponent
    subgridregion:
      type: ModelRegionComponent
    grid:
      type: GridComponent
      region: region
    subgrid:
      type: GridComponent
      region: subgridregion

The region should be specified when setting up the GridComponent, so it can always reference to the right region.

The user should be able to specify the region directly via region.create, or via grid.create.

steps:
  - grid.create:
      region:
        bbox: [1,2,3,4]
# OR
  - region.create:
      region:
        bbox: [1,2,3,4]

Additional Context

Setting up these different components is something that could also be done within a model class when making a plugin like so:

class MyPluginModel(Model):
  def __init__(self):
    super().__init__()
    self.add_component("grid", GridComponent(self, region="region"))
    self.add_component("subgridregion", ModelRegionComponent(self))
    self.add_component("subgrid", GridComponent(self, region="subgridregion"))

GridComponent will keep a str to look up the component when required to prevent the garbage collector from cleaning up the code when needed.

class GridComponent(ModelComponent):
  def __init__(self, region: str):
    self.region_component = region

  def create(self, ...):
    region_component = self._model.get_component(self.region_component, ModelRegionComponent)

deltamarnix avatar Mar 20 '24 07:03 deltamarnix

After more discussion with @hboisgon and @savente93 , we have come to a bit different conclusion.

Every GridComponent will need its own region. Therefore, we will rename ModelRegionComponent to SpatialModelComponent. GridComponent will inherit from SpatialModelComponent. Therefore there will be no region to be found within a model. Instead, Model will contain a property region that will return the region of the main_region_component. This main region component will need to be set in the yaml config.

In this example you see how it simplifies the configuration. There is no need to add a region component anymore, since the grid is leading in this case. The initializer of ForcingComponent will have an optional region argument. If it is not set, it will grab the region from the model instead.

global:
  main_spatial_component: grid
  components:
    grid:
      type: GridComponent
    subgrid:
      type: GridComponent
    rain:
      type: ForcingComponent
      region: subgrid
steps:
  - subgrid.create:
      region:
        bbox: [1,2,3,4]
  - setup_forcings:
      region: subgrid
class SpatialModelComponent(ModelRegionComponent):
  def create(self, crs, region, ...):
    ...

class GridComponent(SpatialModelComponent):
  def create(self, crs, region, ...):
    ...

class Model:
  @property
  def region(self):
    return self.main_spatial_component.region

deltamarnix avatar Mar 20 '24 11:03 deltamarnix

After more discussion with @DirkEilander again, we have come to the conclusion that it's better to keep the ModelRegionComponent as it is and not let GridComponent subclass from it. We want to be able to share a region between components, like the example below, where rain uses the same subgridregion.

Suppose that you would only use HydroMT core:

global:
  components:
    subgridregion:
      type: ModelRegionComponent
    grid:
      type: GridComponent
      # region: region # The default is 'region', so the parameter is optional
    subgrid:
      type: GridComponent
      region: subgridregion
    rain:
      type: ForcingComponent
      region: subgridregion
steps:
  - grid.create:
      region: # passed to region.create. Optional.
        bbox: [1,2,3,4] 
  - subgrid.create:
      region: # passed to subgridregion.create. Optional
        bbox: [1,2,3,4] 
  - setup_forcings:
      forcing: rain_era5

Everything written in components can normally be initialized by the plugin subclass of Model. The code of GridComponent will create the associated region like so:

class GridComponent(ModelComponent):
  def create(self, region, ...):
    self.model.get_component(self._region_name).create(region=region)

class ModelRegionComponent(ModelComponent):
  def create(self, region, ...):
    if self.data is not None:
      raise AlreadyExists # we cannot allow create functions to be called when that data is already on disk.

We have also decided that we will remove region from the command line, as it improves reproducibility of the model. Regions should come from the yml file. This saves confusion if you have multiple regions defined in the model. It would not be clear which one would be overridden.

Together with @savente93 we discussed how we could ensure that a component is not initialized twice. We have decided that both ModelComponent.read and ModelComponent.create should set Component._initialized to True, so that it will not do anything twice.

@hboisgon , do you think we could handle all use cases with this solution? Or do you think we are missing out on something?

deltamarnix avatar Mar 21 '24 12:03 deltamarnix

Well let's take again Delft3D 1D2D model. Right now we have one big mesh object containing 1D and 2D meshes and the region is derived dynamically from mesh.total_bounds. I guess we could split into mesh1d and mesh2d, create a separate new object for 1D2Dlinks and merge them all together again only when writing.

But how we build mesh1d is that we do it little by little eg first add rivers to mesh1d then add pipes to mesh1d (and later channels and tunnels). But I cannot consider pipes and rivers as separate entity because to have a valid mesh1d network they need to be connected properly.

I tried here to find an example of 1D2D model and defining the region for mesh (1D+2D), mesh2d (floodplain grid), mesh1d (river+pipes). And with this implementation, I don't think I can define properly my mesh1d object:

global:
  components:
    mesh1dregion:
      type: ModelRegionComponent
    river1dregion:
      type: ModelRegionComponent
    pipe1dregion:
      type: ModelRegionComponent
    channel1dregion:
      type: ModelRegionComponent
    tunnel1dregion:
       type: ModelRegionComponent
    mesh1d:
      type: Mesh1DComponent
      region: mesh1dregion
    mesh2dregion:
      type: ModelRegionComponent
    mesh2d:
      type: Mesh2dComponent
      region: mesh2dregion
    strutures1d: #eg bridges
      type: GeomComponent
      region: mesh1dregion # eg to cover rivers and channels for bridges
    rain:
      type: ForcingComponent
      region: mesh2dregion
steps:
  - mesh2d.create:
      region: # passed to mesh2dregion.create. Optional.
        geom: mesh2dmask.shp
  - mesh1d.create_rivers:
      region: # passed to river1dregion.create. Optional
        geom: rivermask.shp
      rivers: osm_rivers
  - mesh1d.create_pipes:
      region: # passed to pipe1dregion.create
        geom: pipemask.shp
      pipes: osm_roads
  - mesh1dregion.create: # or mesh1dregion.finalize and we raise a NotImplemented method for create in Mesh1dRegion
  - region.create: # or region.finalize to get merge 1D2D total bounds, only if we can overwrite default region component
  - setup_bridges:
      bridges: bridges_data # needs mesh1dregion to be defined
  - setup_2drain:
      forcing: rain_era5

In that example, I don't see how I can create the mesh1dregion and also not the larger model region that contains 1D + 2D (not that it is needed per say I guess, unless hydromt enforces to have a component called region by default).

Unless we add a new function only for Delft3D that overwrites the core region.create method like region.finalize to create the region from mesh1d.total_bounds? But that does not seem too handy I think and maybe extra burden for the user.

image

hboisgon avatar Mar 22 '24 01:03 hboisgon

I see your point @hboisgon. We now allow for a region to be shared by one or more components, But a component can only have one region. You want multiple (sub)regions per component.

I wonder however if we can do this in core in a generic way. The main purpose of the ModelRegionComponent linked to a component class is to know what area to request data for in the add_xx_data methods, right? How could we in a generic way know for which method to use what region if the component has multiple regions?

In the plugin you know because the methods are specific for a certain subcomponent of mesh. Wouldn't it therefore be much easier to solve for this in the mesh component of the plugin by adding the subcomponent regions as properties to your subclass of the mesh component? These also wouldn't necessarily need to be of the ModelRegionComponent type but simply GeoDataFrame objects, which you could also read and write as part of the mesh read and write methods.

For instance (but there are many ways to do this) If you want the reuse the MeshComponent.create method you can set the mesh2d region as its main region. In each subsequent step you add a subcomponent region to a local property. Similarly you keep a 1D2D region property which you update with each new subcomponent region. Would that work?

DirkEilander avatar Mar 22 '24 07:03 DirkEilander

Or, since ModelRegionComponent.data is a GeoDataFrame object. we could also add multiple geometries to it. With e.g., a ModelRegionComponent.add_subregion method we could add the mesh1d, rivers etc subgregions. The total region is than simply a unary union of all geometries. While this solutions provides a more general way to set, read and write subregions it still requires a plugin that implements when to use which subregion in which method.

I think the core should accommodate plugins to create logical Model objects for specific model software but not everything can or should be solved in core.

DirkEilander avatar Mar 22 '24 07:03 DirkEilander

So I don't really care about the subregions for rivers and pipes if the region.create methods becomes a workflow function that can be called without creating a ModelRegionComponent. The one I care about is the mesh1dregion that I added need to add new objects like bridges structures. And this mesh1dregion I cannot create from a region dict but only from the mesh1d.total_bounds so we need to make sure in core that I can create this type of region still.

I'm not sure if what you suggest makes this possible?

hboisgon avatar Mar 22 '24 07:03 hboisgon

It sounds like we are looking for new types of components that the ModelRegionComponent cannot cater at the moment. I would say that we create some other classes that could solve this.

class MeshRegionComponent(ModelComponent):
    def __init__(self, model: Model, mesh: str):
        self._mesh_component = mesh

    def create():
        # do some magic here
        total_bounds = self._model.get_component(self._mesh_component, MeshComponent).total_bounds

And in the yaml we can make very specific types

global:
  components:
    mesh1dregion:
      type: MeshRegionComponent
      mesh: mesh1d
    mesh1d:
      type: Mesh1dComponent
steps:
  - mesh1d.create
  - mesh1dregion.create

This would give you the possibility to create a region after the mesh was created without having to make special functions like finalize.

As for the total Model.region. I am wondering if it is required. We could consider that region is neglected by the FM plugin, as there are so many different parts of the mesh. Another option that we could provide with core, would be a CompoundRegionComponent class that would look a bit like this:

global:
  components:
    region: # override the original Model.region component
      type: CompoundRegionComponent:
      parts:
        - mesh1dregion
        - mesh2dregion
    mesh1dregion:
      type: ModelRegionComponent # or special MeshRegionComponent?
    mesh2dregion:
      type: ModelRegionComponent
steps:
# do all the stuff
class CompoundRegionComponent(ModelComponent):
  def __init__(self, model: Model, parts: list[str]):
    self._region_parts = parts

  def create(self):
    # Do magic to merge all geoms.

  def write(self):
    pass # don't write, the components handle that themselves?

deltamarnix avatar Mar 22 '24 08:03 deltamarnix

I think your suggestion could work @deltamarnix. Compared to now, the user still needs to manually add mesh1dregion.create in the yml steps which was not necessary before. I think we could promote it anyway as a way for the user to double check he is done building the mesh and can now add other data to the model so might not be too bad.

Two things then to ensure the implementation works:

  • Your suggestion for MeshRegion(ModelComponent) means that the region argument of ModelComponent is not mandatory and type is not necessarily ModelRegionComponent.
  • Optional: to keep full flexibility I still then would prefer that the core region.create method that derives a geodataframe from a region dict calls an external workflow function so that I do not necessarily need to create a ModelRegionComponent to be able to parse that dict.

hboisgon avatar Mar 22 '24 08:03 hboisgon

The one I care about is the mesh1dregion that I need to add new objects like bridges structures. And this mesh1dregion I cannot create from a region dict but only from the mesh1d.total_bounds so we need to make sure in core that I can create this type of region still.

I understand this is not possible in the core with the current functionality. And I guess I'm missing something, but why is this not possible to do this in the plugin code? In plugins we can simply calculate unary_unions or merge data on the fly when we need it (and optionally cache the new data) right?

So for your example @hboisgon, if I understand correctly, the mesh1d.create_rivers and mesh1d.create_pipes construct a part of the mesh1d object right? In setup_bridges you could then calculate the mesh1d.total_bounds on the fly and set this to mesh1dregion if it was not yet initialized before getting data for that combined region, this could be done using the MeshRegionComponent.create method as @deltamarnix suggests. A user does not have to execute steps like mesh1dregion.create or mesh1dregion.finalize. You could do the same with the subregion geometries if these are cached.

to keep full flexibility I still then would prefer that the core region.create method that derives a geodataframe from a region dict calls an external workflow function so that I do not necessarily need to create a ModelRegionComponent to be able to parse that dict.

I agree. Besides a workflow method this could also be achieved with Region class that @savente93 was working on earlier. In your usecase however, wouldn't a user always use geometries for river and pipes regions and hence can't you treat these as simple geodataframe data sources?

The workflow would than simply look like:

global:
  components:
    mesh1dregion:
      type: ModelRegionComponent
    mesh1d:
      type: Mesh1DComponent
      region: mesh1dregion
steps:
  - mesh1d.create_rivers:
      river_mask: rivermask.shp # optionally saved as subregion in mesh1d or mesh1dregion
      rivers: osm_rivers
  - mesh1d.create_pipes:
      pipes_mask: pipemask.shp # optionally saved as subregion in mesh1d or mesh1dregion
      pipes: osm_roads
  - setup_bridges:  # calculate mesh1dregion on the fly if not initialized based on mesh1d.total_bounds or a unary_union of all subregions.
      bridges: bridges_data

DirkEilander avatar Mar 22 '24 11:03 DirkEilander

sub regions are currently not part of the simple ModelRegionComponent. If we want to expand on different types of region definitions I would go for something like a CompoundRegionComponent, and not bloat ModelRegionComponent with even more functionality.

deltamarnix avatar Mar 22 '24 11:03 deltamarnix

Hi @DirkEilander and @deltamarnix

I indeed don't think that we need CompoundRegionComponent for this and that I can create the mesh1dregion on the fly in the plugin. So like I mention in my last answer, I would be okay with the implementation but now that we are validating a lot of things in the core, when adding validation for ModelRegionComponent then the use case of mesh1dregion should be feasible (ie if it is a ModelRegionComponent I can overwrite methods and create it without a region dict).

For the case of pipes and rivers I would still like to allow for different region types than just geom (eg bbox but also maybe interbasin for rivers, like sfincs has).

hboisgon avatar Mar 25 '24 01:03 hboisgon

I don't have a PR yet, but this is what it would look like

https://github.com/Deltares/hydromt/compare/refactor-model...region-component-2

So far I have:

  • Removed region as a command line argument
  • Given GridComponent a reference name to the region it should connect to
  • ModelRegionComponent handles the crs code that was first only in GridComponent
  • GridComponent first calls region.create() inside of its create function

Implications:

  • It is no longer possible to set region from the command line. This is part of your yaml.
  • There are a couple of TODOs in region.py that I think might be revised.
  • An empty model no longer calls region.create during build by default. And it no longer removes region.create from update, because we don't know if region.create is a direct call, or an indirect call in GridComponent (or any other component in the future).

My question to @hboisgon and @DirkEilander:

  • Are all use cases handled as expected?
  • Do you agree on removing region from the command line?
  • Do you agree on no longer calling region.create by default?
  • Can you take a look at the TODOs in region.py?
  • Some things, like CompoundRegionComponent, might not be covered yet, but it sounds like a new feature.
  • Should we keep the default ModelRegionComponent in an empty model?

deltamarnix avatar Mar 26 '24 09:03 deltamarnix

@hboisgon and I discussed you question, here's our reply:

Are all use cases handled as expected?

There is one more important use case that is not yet handled. When we want to update an existing model that does not have the region geometries saved to files, the region cannot be read and should be inferred from the component data. This is often the case as the region files are hydromt files and not strictly model files, like other component data. For instance when adding a new grid variable to an existing the model which already has a grid defined we want to only request data for the grid domain, hence GridRegion should be based on the bounding box of GridComponent.data in this case. In a plugin this could be refined based on a specific grid variable. We could potentially solve this with the pseudo-code below

Secondly, we want to be able to parse dict regions outside to RegionComponent. We could simple achieve this by making it a function in workflows again. At a later stage it could be improved by making a separate Region class like Sam was working on.

Do you agree on removing region from the command line?

yes

Do you agree on no longer calling region.create by default?

yes

Can you take a look at the TODOs in region.py?

The crs argument in the RegionComponent.create method is the destination crs, we use it in case the model grid/mesh/etc should in a different crs than the region input dict. Perhaps renaming it to dst_crs, like in gdal/rasterio methods, would make it more clear. A separate crs argument in the region dict could be used to set the crs of the bbox. If not provided is should default to 4326

Some things, like CompoundRegionComponent, might not be covered yet, but it sounds like a new feature.

Not needed we think

Should we keep the default ModelRegionComponent in an empty model?

No strong opinions about this one. We could keep it for users to still use model.region as a property. However we agree with not making any assumptions about this region in model.create or model.build.

class RegionComponent():
    _DEFAULT_REGION_FILE_PATH = Path("region.geojson") # this should be class property so it can be overridden by subclasses

    def __init__(self, model, component_name=None) -> None:

        self.component_name: str = component_name
        # make sure the region component has the infer_region method; not all components have it (e.g. tables); it cannot point not components that do not have this method.
        assert hasattr(self.model.get_component(component_name), 'infer_region')

    def infer_from_component(self, component_name: str) -> None:
        component = self.model.get_component(component_name)
        if component.data is not None:
            self.set(component.infer_region())
        else:
            self.logger.warning(f"Region could not be inferred from {component_name}")

    @property
    def data(self) -> Optional[GeoDataFrame]:
        """Provide access to the underlying GeoDataFrame data of the model region."""
        if self._data is None and self._root.is_reading_mode():
            if (self.model.root / self._DEFAULT_REGION_FILE_PATH).exist():
                self.read()
            else:
                self.infer_from_component()

        return self._data

DirkEilander avatar Mar 27 '24 11:03 DirkEilander

Secondly, we want to be able to parse dict regions outside to RegionComponent.

Can you help me understand why this is necessary? I thought that part of how we wanted to design the region component was that it could handle all of these parsing and converting tasks, Having to place it outside of RegionComponent is going to give a lot of trouble with circular imports again I think .

savente93 avatar Mar 27 '24 14:03 savente93

After lots of PR comments and discussions we have decided to come up with the following:

There is a base class SpatialModelComponent, this class provides region, crs, and bounds properties. It is possible for a SpatialModelComponent to reference to another SpatialModelComponent. In that case the referenced component will actually be providing that region and bounds information. SpatialModelComponent handles this.

We have decided that GridComponent and MeshComponent will never reference to another SpatialModelComponent, as they will always need a definition in the create function.

Region parsing has been split into parse_region_grid, parse_region_bbox, etc., as all of these functions seem to have different input options, but also different output types. Separating the different functions make it easier to understand what will come out of the function.

deltamarnix avatar May 15 '24 14:05 deltamarnix