earthaccess icon indicating copy to clipboard operation
earthaccess copied to clipboard

Adding earthaccess catalog in Intake 2

Open martindurant opened this issue 2 years ago • 5 comments
trafficstars

I have written a little code which enables calling the earthaccess functions from within intake. The point of this, is that certain queries and dataset results could then be persisted in catalogs without having to keep code snippets around. The users still need to register and understand what the query parameters mean.

Do people here think this is a useful thing to do, and does the implementation look OK? Am I right in assuming that the DOI is the best unique identifier of a data product?

martindurant avatar Nov 10 '23 21:11 martindurant

Nice! I haven't used Intake before, but excited to see more integrations :) What would using this look like?

Am I right in assuming that the DOI is the best unique identifier of a data product?

I think collection_concept_id is going to be the "best" unique identifier (as intended by the CMR API, not necessarily easiest-to-use). Under the hood, earthaccess is translating the doi query to a concept_id query by doing a collection search to get the concept_id.

https://github.com/nsidc/earthaccess/blob/7db2e59fb76d9eea87a343bbf2af505a57c43e10/earthaccess/search.py#L699-L702

MattF-NSIDC avatar Nov 10 '23 21:11 MattF-NSIDC

collection_concept_id is going to be the "best" unique identifier

Thanks, I'll use that.

The use pattern would be like

import intake.readers.catalogs
spec = intake.readers.catalogs.EarthdataCatalogReader(temporal=("2002-01-01", "2002-01-02"), ....)
cat = spec.read()
list(cat) # shows available identifiers, which all have metadata
reader = cat[<identifier>]
ds = reader.read() # outputs an xr.DataSet

Of course, the flow is nearly exactly the same as you have anyway, but the point is that spec and reader with their parameters can be saved in catalogs.

martindurant avatar Nov 10 '23 21:11 martindurant

I am working with provisional ATL07/10 data, and would like to set up some access to our local repositories. These are pre-decisional data, and cannot be added for general access. I have been looking for instructions and/or tutorials on how to set up intake/earthaccess to access local files/repositories, but have not figured it out yet, so I thought I would ask here .

As a note, it has been 5+ years since I worked on setting up any intake catalogs, so pointers to instructions on setting this out would be helpful. I will be glad to post tutorials and instructions once I get this worked out, but I will first have to get permission for the public release.

ebo avatar Dec 06 '23 07:12 ebo

The general Earth catalog maker for Intake 2 is here: https://github.com/intake/intake/blob/745ebd42db371aa7d0f5d7d2ca8744103532819d/intake/readers/catalogs.py#L623

This calls earthaccess.search_datasets - so I don't know how you would change that to point to local resources.

martindurant avatar Dec 06 '23 14:12 martindurant

Thanks! This gives me a place to start. Ill post something here if I find a workable solution.

ebo avatar Dec 06 '23 15:12 ebo