earthaccess
earthaccess copied to clipboard
We need a cli (command line interface)
It would be handy to use the library in a stand-alone mode with a command line interface (CLI), this way we can cover more use cases. The idea is to support most if not all of the current features including search and access.
I need a MOD09GA CMR search client. I was going to try to implement a CLI interface to this tool. As per Contributing.md I am writing here to Create an issue to discuss the feature's scope and its fit for this package.
I assume click is the preferred CLI interface? Or is OK if I use stdlib argparse? I prefer argparse beacuse I know it better and it's part of the standard library.
Shall we define an interface here, and then I can program to that interface?
Program name: earthaccess or earthaccess-cli?
Maybe something like:
earthaccess (search ((--csv | --json) (--collection | --granule)) | download) [--keyword KEY --concept-id ID --bbox xmin ymin xmax ymax --output-folder --etc...]
I tend to prefer higher abstractions in general but argparse it's fine. I like the options you described in the example, they capture well what earthaccess should be about. Another feature that would be great to have with a cli is a config file for storing search parameters and maybe some stats about the returned queries? I'm thinking, if we are mostly interested on a single dataset like MOD09GA and we are going to use earthaccess as a "downloader", then maybe we will like to know total hits we got from CMR, the last time we ran it etc. something like
config.json
{
"params": {["all the search parameters"]},
"lastRun": {
"total_downloaded": 124445,
"start_range": "1998-10-03T00:00:00",
"end_date": "2022-10-03T00:00:00",
"last_updated": "2022-10-03T00:00:00"
},
}
actually... the config file could be just for searching and the stats on returned items and downloaded granules from the last run could be dumped into another file, again this is just an idea just for having some observability on what we have done with our downloads.
Program name: earthaccess or earthaccess-cli?
I'd definitely prefer earthaccess for the CLI tool name and it'd be good to define the CLI in earthaccess/__main__.py so python -m earthaccess would also work
And copying over @andypbarrett's message from Slack: https://openscapes.slack.com/archives/C05TMK269HA/p1711473778577889
@asteiker and I were just discussing how to integrate Earthdata Search results with programmatic search and access tools. Earthdata Search generates a wget script. NSIDC Data Access Tool creates a python script. I use these kinds of tools a lot. As stand alone scripts they are fine but I also often want to modify them - subset, transform or some other processing.
My thought is that earthaccess can replace this. I like this approach from Copernicus for ERA5 Reanalysis data that creates search parameters based on selections in a GUI. Here I want 00:00, 06:00, 12:00, 18:00Z 2 m temperature for January 2024 as a netcdf. I have to install cdsapi but once that is done, I can run these queries and modify the code.
import cdsapi c = cdsapi.Client() c.retrieve( 'reanalysis-era5-single-levels', { 'product_type': 'reanalysis', 'format': 'netcdf', 'variable': '2m_temperature', 'year': '2024', 'month': '01', 'day': [ '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', ], 'time': [ '00:00', '06:00', '12:00', '18:00', ], }, 'download.nc')So what about earthaccess.retrieve that does the same thing. This would be an API that combines login, search_data, and download. There could also be a streaming option. The API could be the basis for a CLI earthaccess_retrieve tool. Or maybe $ earthaccess retrieve -c <concept_id> --area
--month
Notably, ASF also provides a Python bulk download script to get your search results, which is handy and well used.
Another take on that is ASF's Vertex at https://search.asf.alaska.edu, which lets you "export Python" from a search. The exported code provides you a way to get your same search results via the asf_search Python package (asf_search is similar to Earthaccess but focused on SAR data as SAR needs some specialized search and discovery features that's not applicable DAAC-wide). That looks like:
which gives you a Python code block like:
import asf_search as asf
options = {
'intersectsWith': 'POINT(-140.4916 59.9632)',
'dataset': 'SENTINEL-1',
'processingLevel': 'SLC',
'flightDirection': 'Ascending',
'beamSwath': 'IW',
'maxResults': 250
}
results = asf.search(**options)
print(results)
I think implementing a feature like this in Earthdata Search that uses Earthaccess should be pretty straight forward, and a general download script could be just scriptify-ing that or providing a config file to pass to an earthaccess CLI interface.
So it'd be nice to ether target or keep this in mind when developing the CLI tool
As for a config file, I don't generally like only driving off a config/ini file, so I like this example:
earthaccess (search ((--csv | --json) (--collection | --granule)) | download) [--keyword KEY --concept-id ID --bbox xmin ymin xmax ymax --output-folder --etc...]
but it'd be nice to set it up so you could also use a config/ini file to pass in the options. Argparse and I'm sure click and others support setting that up
I also prefer earthaccess as the CLI name. I am certain I would forget to add -cli all the time.
I think for the python snippet from Earthdata Search, combining search and download into a single retrieve option would be better option. I suspect most users will want to access the data in some way rather than just getting search results. That does raise the question of whether a user wants to download or open and stream, but that could be a keyword. Something like...
import earthaccess
options = {
...
download = True # False would stream data
}
results = earthaccess.retrieve(**options)
If downloading, results is a list of filepaths. If it is streaming, it is a list of fileobjects.
I think for the python snippet from Earthdata Search, combining search and download into a single retrieve option would be better option.
I'm not sure I agree, though I'm not opposed to a retreive method (and we might want that for the CLI anyways). But, hiding complexity from users vs teaching how to use tools is definitely a balance.
A code snippet similar to our README, would be more than sufficient IMO:
import earthaccess
results = earthaccess.search_data(
short_name='SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205',
cloud_hosted=True,
bounding_box=(-10, 20, 10, 50),
temporal=("1999-02", "2019-03"),
count=10
)
# Download the data
files = earthaccess.download(results)
# OR stream data directly to xarray
import xarray as xr
ds = xr.open_mfdataset(earthaccess.open(results))
# OR get a list of URLs
data_links = [granule.data_links(access="external") for granule in results]
I think for the python snippet from Earthdata Search, combining search and download into a single retrieve option would be better option.
I'm not sure I agree, though I'm not opposed to a retreive method (and we might want that for the CLI anyways). But, hiding complexity from users vs teaching how to use tools is definitely a balance.
:100: I think the script should line up with the Earthdata search user experience, i.e. first I search, then I look at my results, then I perhaps revise my search, then I download.
Perhaps the script could even have a review stage where the data to be downloaded is presented to the user and they're offered the option to continue Y/n, giving them the opportunity to revise the search if they haven't already.