hydromt icon indicating copy to clipboard operation
hydromt copied to clipboard

Catalogs refactor

Open DirkEilander opened this issue 11 months ago • 8 comments

Issue addressed

Fixes #844

Explanation

Explain how you addressed the bug/feature request, what choices you made and why.

Checklist

  • [ ] Updated tests or added new tests
  • [ ] Branch is up to date with main
  • [ ] Tests & pre-commit hooks pass
  • [ ] Updated documentation if needed
  • [ ] Updated changelog.rst if needed
  • [ ] For predefined catalogs: update the catalog version in the file itself, the references in data/predefined_catalogs.yml, and the changelog in data/changelog.rst

Additional Notes (optional)

Add any additional notes or information that may be helpful.

DirkEilander avatar Mar 22 '24 17:03 DirkEilander

TODO:

  • [x] try to fix linux vs windows os issue with hash; if not easily fixed remove it
  • [x] add older version to artifact_data and deltares_data (check if version numbering makes sense)
  • [x] docs and changelogs
  • [x] cache data_catalog.yml files in DataCatalog._cache_dir/<catalog_name>//data_catalog.yml
  • [x] if versions.yml not accessible use predefined_catalogs._get_catalog_versions to browse local versions

Decided to postpone to v1

  • [ ] create entrypoint for data catalogs plugins

DirkEilander avatar Mar 28 '24 12:03 DirkEilander

@DirkEilander in the case you are still working, I do not really understand what I need to mock with the entry_points in _get_catalog_eps. The entry_points are empty in the tests and where is hydromt.catalogs entrypoint defined?

Tjalling-dejong avatar Apr 17 '24 15:04 Tjalling-dejong

@DirkEilander in the case you are still working, I do not really understand what I need to mock with the entry_points in _get_catalog_eps. The entry_points are empty in the tests and where is hydromt.catalogs entrypoint defined?

@Tjalling-dejong The idea is to test predefined catalogs which are provided by entrypoints. These can be set by users in their pyproject.toml as a "hydromt.catalogs" entrypoint and are discovered in predefined_catalogs.py. For the core these are bypassed as LOCAL_EPS.

I'm however also fine with skipping this test for now as @savente93 has done a really nice job in v1 to generalize all entrypoints in plugins.py. It would be nice to use that after merging this work in v1 too in which case the tests will probably have te be rewritten. Compared to the other entrypoints this refers to a versions.yml file rather than a class though. But if needed we could refactor the predefined catalog versions into a simply (pydantic) class as well.

DirkEilander avatar Apr 18 '24 07:04 DirkEilander

@Tjalling-dejong if you want me to go over how we did plugins.py let me know

savente93 avatar Apr 18 '24 07:04 savente93

@Tjalling-dejong can you please update the title, description and checklist?

savente93 avatar Apr 18 '24 13:04 savente93

@Tjalling-dejong @savente93 I refactored the implementation as I found that the hashes where still not working and the required functionality was a bit hidden in the DataCatalog class.

I therefore creating a new PredefinedCatalog class which is responsible for retrieving the right data_catalog.yml file. These files should be stored in a specific structure, see PredefinedCatalog docstring and contain a pooch registry.txt file. This file contains an overview of all catalog files including its version and sha256 hash. Using pooch we retrieve the catalog files which are cached to the hydromt cache dir. This means we only need to download a data catalog file once and in case the registry.txt file cannot be accessed we can use the cached files. The PredefinedCatalog class is subclassed per predefined data catalog (e.g. DeltaresDataCatalog for deltares_data) to make it easy to add it to the plugin infrastructure later. A subclass only changes two class variables the base url of the predefined catalog files and its name.

I'm going to be on leave from next week. Feel free to undo my commits if you don't like it or make changes as deemed necessary.

@savente93 I didn't manage to update the pixi.toml file so thats why the tests are failing. The pip test runs, so I think it should work with pixi too.

DirkEilander avatar Apr 20 '24 10:04 DirkEilander

I like the idea for this refactor. I haven't heard of Pooch before and it seemed like a relatively new project but given that a lot of big libraries like scipy and xarray also depend on it I think adding this is okay. I'll go through the code for a proper review in a minute, but I want to see the tests passing before we merge. @Tjalling-dejong will you need help with that?

savente93 avatar Apr 22 '24 09:04 savente93

I like the idea for this refactor. I haven't heard of Pooch before and it seemed like a relatively new project but given that a lot of big libraries like scipy and xarray also depend on it I think adding this is okay. I'll go through the code for a proper review in a minute, but I want to see the tests passing before we merge. @Tjalling-dejong will you need help with that?

Yes please, I dont know how to fix the failing tests

Tjalling-dejong avatar Apr 22 '24 09:04 Tjalling-dejong

@savente93 Can you review this PR? Note that the Build Documentation fails because it tries to retrieve a data catalog versions.yaml from main that does not exist yet.

Tjalling-dejong avatar Apr 23 '24 12:04 Tjalling-dejong

@Tjalling-dejong I'll have a more indepth look in a bit, the failing docs are okay, but there are still a few issues flagged my sonar cube and the PR description is still empty. Could you fix those in the mean time please

savente93 avatar Apr 23 '24 13:04 savente93

So, perhaps a stupid question but: is this change backwards compatible? i.e. will previous versions of hydromt break if we merge this? (I'll do a review regardless but I'd like some confirmation that this won't break things again before we merge)

savente93 avatar Apr 25 '24 11:04 savente93

So, perhaps a stupid question but: is this change backwards compatible? i.e. will previous versions of hydromt break if we merge this? (I'll do a review regardless but I'd like some confirmation that this won't break things again before we merge)

I dont think it will break previous versions of hydromt. It works even if you do not specify a data catalog version.

Tjalling-dejong avatar Apr 25 '24 12:04 Tjalling-dejong