hydromt icon indicating copy to clipboard operation
hydromt copied to clipboard

move to intake?

Open DirkEilander opened this issue 2 years ago • 3 comments

I have done some research to the pros and cons for moving from our own DataCatalog and IO to intake.

Why intake:

  • larger community
  • growing set of exposed datasets, e.g. the Pangeo Catalog and remote climate data
  • support for STAC (with intake-stac plugin)
  • support for cloud based "analysis ready" data from AWS S3, GCS etc
  • support for caching of data where direct access is not possible
  • support for nested catalogs
  • many more drivers available, see list of plugins

Challenges when moving to intake:

  • new yaml format (breaking change) and automatic full proof conversion seems hard!
  • our preprocessing steps (rename, unit conversion, etc) should be split from the intake of data. Related arguments can be set only in the metadata section of an intake data source in the yaml.
  • support for some hydromt specific drivers, specifically raster_tindex, vector geodataset

Alternative: support for intake in our own catalog system

  • by setting up a nested catalog system we could potentially support both catalogs, the main Catalog would then be a set of (hydromt or intake) Catalogs with each a set of data sources
  • PROS: not breaking and easier (cheaper) to setup while keeping everything running.
  • CONS: different yaml formats might be confusing to users; much more maintenance

DirkEilander avatar Mar 24 '22 10:03 DirkEilander

@evetion @hboisgon Curious to hear your thoughts.

DirkEilander avatar Mar 24 '22 10:03 DirkEilander

Another point for intake is that we can remove most drivers from here, a smaller package is always nice.

If you move, you might create a intake plugin for your tiled raster input, I think that the vector datasets are already supported? And you might split the Catalog and IO part, you could move to the intake yaml format (rename/nest some keys) before you actually replace the drivers one by one?

Lastly, I wouldn't start using a hybrid approach where you can parse both types of catalogs, I don't think it will be cheaper (at least not in the long run).

evetion avatar Mar 24 '22 17:03 evetion

see also #67

DirkEilander avatar Apr 26 '22 08:04 DirkEilander