hydromt
hydromt copied to clipboard
Catalogs refactor
Issue addressed
Fixes #844
Explanation
Explain how you addressed the bug/feature request, what choices you made and why.
Checklist
- [ ] Updated tests or added new tests
- [ ] Branch is up to date with
main
- [ ] Tests & pre-commit hooks pass
- [ ] Updated documentation if needed
- [ ] Updated changelog.rst if needed
- [ ] For predefined catalogs: update the catalog version in the file itself, the references in data/predefined_catalogs.yml, and the changelog in data/changelog.rst
Additional Notes (optional)
Add any additional notes or information that may be helpful.
TODO:
- [x] try to fix linux vs windows os issue with hash; if not easily fixed remove it
- [x] add older version to artifact_data and deltares_data (check if version numbering makes sense)
- [x] docs and changelogs
- [x] cache data_catalog.yml files in DataCatalog._cache_dir/<catalog_name>/
/data_catalog.yml - [x] if versions.yml not accessible use predefined_catalogs._get_catalog_versions to browse local versions
Decided to postpone to v1
- [ ] create entrypoint for data catalogs plugins
@DirkEilander in the case you are still working, I do not really understand what I need to mock with the entry_points in _get_catalog_eps. The entry_points are empty in the tests and where is hydromt.catalogs entrypoint defined?
@DirkEilander in the case you are still working, I do not really understand what I need to mock with the entry_points in _get_catalog_eps. The entry_points are empty in the tests and where is hydromt.catalogs entrypoint defined?
@Tjalling-dejong The idea is to test predefined catalogs which are provided by entrypoints. These can be set by users in their pyproject.toml as a "hydromt.catalogs" entrypoint and are discovered in predefined_catalogs.py. For the core these are bypassed as LOCAL_EPS.
I'm however also fine with skipping this test for now as @savente93 has done a really nice job in v1 to generalize all entrypoints in plugins.py. It would be nice to use that after merging this work in v1 too in which case the tests will probably have te be rewritten. Compared to the other entrypoints this refers to a versions.yml
file rather than a class though. But if needed we could refactor the predefined catalog versions into a simply (pydantic) class as well.
@Tjalling-dejong if you want me to go over how we did plugins.py
let me know
@Tjalling-dejong can you please update the title, description and checklist?
@Tjalling-dejong @savente93 I refactored the implementation as I found that the hashes where still not working and the required functionality was a bit hidden in the DataCatalog
class.
I therefore creating a new PredefinedCatalog
class which is responsible for retrieving the right data_catalog.yml file.
These files should be stored in a specific structure, see PredefinedCatalog
docstring and contain a pooch registry.txt file. This file contains an overview of all catalog files including its version and sha256 hash. Using pooch we retrieve the catalog files which are cached to the hydromt cache dir. This means we only need to download a data catalog file once and in case the registry.txt file cannot be accessed we can use the cached files.
The PredefinedCatalog
class is subclassed per predefined data catalog (e.g. DeltaresDataCatalog
for deltares_data) to make it easy to add it to the plugin infrastructure later. A subclass only changes two class variables the base url of the predefined catalog files and its name.
I'm going to be on leave from next week. Feel free to undo my commits if you don't like it or make changes as deemed necessary.
@savente93 I didn't manage to update the pixi.toml file so thats why the tests are failing. The pip test runs, so I think it should work with pixi too.
I like the idea for this refactor. I haven't heard of Pooch before and it seemed like a relatively new project but given that a lot of big libraries like scipy and xarray also depend on it I think adding this is okay. I'll go through the code for a proper review in a minute, but I want to see the tests passing before we merge. @Tjalling-dejong will you need help with that?
I like the idea for this refactor. I haven't heard of Pooch before and it seemed like a relatively new project but given that a lot of big libraries like scipy and xarray also depend on it I think adding this is okay. I'll go through the code for a proper review in a minute, but I want to see the tests passing before we merge. @Tjalling-dejong will you need help with that?
Yes please, I dont know how to fix the failing tests
@savente93 Can you review this PR? Note that the Build Documentation fails because it tries to retrieve a data catalog versions.yaml from main that does not exist yet.
@Tjalling-dejong I'll have a more indepth look in a bit, the failing docs are okay, but there are still a few issues flagged my sonar cube and the PR description is still empty. Could you fix those in the mean time please
So, perhaps a stupid question but: is this change backwards compatible? i.e. will previous versions of hydromt break if we merge this? (I'll do a review regardless but I'd like some confirmation that this won't break things again before we merge)
So, perhaps a stupid question but: is this change backwards compatible? i.e. will previous versions of hydromt break if we merge this? (I'll do a review regardless but I'd like some confirmation that this won't break things again before we merge)
I dont think it will break previous versions of hydromt. It works even if you do not specify a data catalog version.