hydromt icon indicating copy to clipboard operation
hydromt copied to clipboard

improve predefined catalog implementation

Open DirkEilander opened this issue 11 months ago • 4 comments

Current implementation

Currently, the data/predefined_catalogs.yml file describes which predefined catalogs are available and where to find these. The DataCatalog.set_predefined_catalogs() methods reads this file from the main branch to set the DataCatalogs.predefined_catalogs property. If the file cannot be accessed, an error is raised. Data catalog files themselves are stored in data/catalogs and version is done based on git revision hashes, the latest version is always assumed to be in the main branch.

There are a few issues with this implementation:

  • without internet access the DataCatalog does not work
  • we cannot test predefined_catalog.yml file properly as the code always looks at the version at the main branch
  • we cannot (easily) fix bugs in old data catalog versions
  • if we change the predefined_catalog.yml format all previous hydromt version may break

Enhancement Description

How I would like to see this functioning:

  • data catalogs need their own semantic versioning scheme for users to be able to use older versions for reproducibility.
  • the possibility to publish bug fixes to current and older version with a patch release.
  • DataCatalog should always initialize (and not brake because an online file is not found)
  • the possibility to add predefined catalogs by plugins (new)

Possible implementation:

  • catalog are published on a separate branch (e.g. like github-pages) in a fixed scheme (e.g. "/<name>/<version>/data_catalog.yml"). This allows for updating older versions.
  • an overview of predefined catalogs and versions is contained in the codebase (e.g. in a new PredefinedCatalogs class) which is initialized with catalogs exposed by core and plugins via entrypoints

Additional Context

This is also discussed in https://github.com/Deltares/hydromt/discussions/737

DirkEilander avatar Mar 13 '24 15:03 DirkEilander

@savente93 @Jaapel @Tjalling-dejong @deltamarnix I wrote this issue as starting point for our discussion tomorrow. It would be great if you could have a quick look beforehand.

DirkEilander avatar Mar 13 '24 15:03 DirkEilander

just as a primer for our discussion: a common way to do this is to make protected branches for each released version, as it means that we can supply bug fixes independently.

savente93 avatar Mar 14 '24 08:03 savente93

Outcome of discussion

  • use semantic version for format version (major); breaking changes in the catalog such as new data version / rename (minor); bug fixes (patch). Using a catalog format version (instead of hydromt_version compatibility) makes it easier to maintain catalogs.
  • save the data catalog files in a fixed scheme on the main branch, e.g. "/<name>/<version>/data_catalog.yml". These files are basically not editable, for each version we make a new file.
  • There is one root catalog file (now the predefined_catalogs.yml file). That contains a list of all available versions per per predefined data catalog. This file is editable: we add each new data catalog version to the list.
  • This uri to the root catalog file is supplied as an entrypoint such that plugins can also define their catalog overview files.
  • We also version this root catalog file to allow for possible future changes. Old HydroMT versions will than continue to work as these still have entrypoints to the old version.
  • We make sure that if the root catalog file is not found a warning (instead of an error) is given make hydromt less dependant on a single remote file
  • For testing purposes we read the overview file and catalogs directly from the repos within the same branch to make sure we test the files in that branch (and not main).
  • Question: Do we implement this already in v9.x or v1? v1 might be more pragmatic (and realistic). However this situation is blocking further development of the catalogs.

@savente93 @Jaapel @deltamarnix Can you let me know if I missed anything?

DirkEilander avatar Mar 14 '24 17:03 DirkEilander

I was thinking to maintain backwards compatibility with the current HydroMT versions that are out there in the world, we could keep predefined_catalogs.yml for now and all the corresponding catalogs. And that is known as v0. We build a copy of all the files next to them and call that predefined_catalogs.v1.yaml. That should keep all old HydroMT versions working for now, as they are still dependent on the v0 version.

deltamarnix avatar Mar 15 '24 06:03 deltamarnix