earthaccess
earthaccess copied to clipboard
Implement simpler API signatures
The concept of collections vs granules and instantiating those classes can be confusing, especially for new users. Perhaps having a static method that can simplify things would be simpler to use e.g. (using the upcoming name)
import earthaccess
import xarray as xr
auth = earthaccess.login(strategy="netrc")
granules = earthaccess.get_granules(short_name="ATL06",
cloud_hosted=True,
polygon=((1,2),(3,4)... ),
temporal=("2020-01-01","2020-12-31"))
ds = xr.open_mfdataset(earthaccess.open(granules, auth=auth))
This would be better suited for regional use cases since we'll be downloading the metadata from CMR in one go. Workflows that may require bulk downloads could potentially use an iterator like
import earthaccess
auth = earthaccess.login(strategy="netrc")
query = earthaccess.search(short_name="ATL06",
cloud_hosted=True,
page_size=2000,
polygon=((1,2),(3,4)... ),
temporal=("2020-01-01","2020-12-31"))
for granules in query.items():
# here granules is a resultset of up to 2000 granules.
earthaccess.download(granules, "./data/ATL06/", auth=auth)
I like this approach. However, this methods requires knowing the concept id or short name first. This is fine if you know your dataset or know to go to EarthData Search but we need to explain that step. Otherwise getting the shortname is "magic".
I totally agree @andypbarrett, we need to explain where this short_name is coming from and what's more, expose the other ways of finding data (search by DOI for example). These new methods would also abstract the fact that at granule level there is no way of querying using the cloud_hosted flag. New users unfamiliar with CMR would need to know the provider to make that distinction, i.e. NSIDC_ECS vs NSIDC_CPRD
I was wondering if the current search keyword arguments are based on the earthdata search API? Perhaps it'd be possible to use STAC-API standards for more consistency across libraries (at least for the main spatiotemporal ones?).
I know it's nitpicky and annoying to have breaking changes like this, but the user experience is nice coming from other libraries like pystac_client / nasa-cmr-stac. and perhaps the standard names and acceptable formats could have the benefit of re-using already implemented parsers in those other libraries.
https://github.com/radiantearth/stac-api-spec/tree/main/item-search#query-parameter-table e.g.:
temporal -> datetime
polygon -> intersects
collections -> short_name?
This is a great idea @scottyhq !! they don't have to be breaking changes, we can just add them as aliases to the class methods and will work the same way (without breaking the old names), the only thing that might need some work is processing the GeoGSON geometry for intersects. I definitely see value in using a standard. Down the road we can even use pystac under the hood and make this library more generic and not entirely tied to CMR.