Add automated pipeline for pulling, processing, and saving to HF/GCP/Wherever
Detailed Description
Instead of relying on someone to manually run the data pipeline to get more satellite data, we should try automating it, so the dataset just grows on its own.
It could be something like with Prefect, or Airflow, use the current app, or something else? But adding support for that would probably be quite self-contained compared to the rest of the codebase
Context
I manually pull new data every once in awhile, but keeping it standardized and on a schedule would probably keep the data more up to date and easier to add more strict checks than me looking at random examples and the simple checks of no NaNs, etc.
Possible Implementation
We would want to make sure the data is sensible, so maybe add something like https://github.com/great-expectations/great_expectations to check data before its added to the Zarr store.
Another great one to try would be to use Pangeo forge https://pangeo-forge.org/ which is used for a lot of NWP and climate data already and would seemingly work for all our datasets, other than maybe PV