hydromt
hydromt copied to clipboard
move to intake?
I have done some research to the pros and cons for moving from our own DataCatalog and IO to intake.
Why intake:
- larger community
- growing set of exposed datasets, e.g. the Pangeo Catalog and remote climate data
- support for STAC (with intake-stac plugin)
- support for cloud based "analysis ready" data from AWS S3, GCS etc
- support for caching of data where direct access is not possible
- support for nested catalogs
- many more drivers available, see list of plugins
Challenges when moving to intake:
- new yaml format (breaking change) and automatic full proof conversion seems hard!
- our preprocessing steps (rename, unit conversion, etc) should be split from the intake of data. Related arguments can be set only in the
metadata
section of an intake data source in the yaml. - support for some hydromt specific drivers, specifically
raster_tindex
,vector
geodataset
Alternative: support for intake in our own catalog system
- by setting up a nested catalog system we could potentially support both catalogs, the main Catalog would then be a set of (hydromt or intake) Catalogs with each a set of data sources
- PROS: not breaking and easier (cheaper) to setup while keeping everything running.
- CONS: different yaml formats might be confusing to users; much more maintenance
@evetion @hboisgon Curious to hear your thoughts.
Another point for intake is that we can remove most drivers from here, a smaller package is always nice.
If you move, you might create a intake plugin for your tiled raster
input, I think that the vector
datasets are already supported? And you might split the Catalog and IO part, you could move to the intake yaml format (rename/nest some keys) before you actually replace the drivers one by one?
Lastly, I wouldn't start using a hybrid approach where you can parse both types of catalogs, I don't think it will be cheaper (at least not in the long run).
see also #67