earthaccess icon indicating copy to clipboard operation
earthaccess copied to clipboard

Expected type of search_data() "bounding_box" seems too narrow

Open arthur-e opened this issue 2 years ago • 9 comments

The following generates a TypeError (Traceback at bottom of post):

results = earthaccess.search_data(
    short_name = 'M2SDNXSLV',
    temporal = ("2023-01", "2023-07"), 
    bounding_box = [2, 35, 4, 37]) # NOTE: A list

While this example works as expected (no error):

results = earthaccess.search_data(
    short_name = 'M2SDNXSLV',
    temporal = ("2023-01", "2023-07"), 
    bounding_box = (2, 35, 4, 37)) # NOTE: A tuple

It's not immediately obvious from the documentation that bounding_box must be a tuple.

Maybe there is a good reason that a list type is not acceptable? It seems to me like list and tuple should both be accepted.

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[68], line 1
----> 1 results = earthaccess.search_data(
      2     short_name = 'M2SDNXSLV',
      3     temporal = ("2023-01", "2023-07"), 
      4     bounding_box = [2, 35, 4, 37])
      6 time_series = []
      7 file_list = earthaccess.open(results)

File /usr/local/python-env/OpenClimate/lib/python3.8/site-packages/earthaccess/api.py:103, in search_data(count, **kwargs)
     65 def search_data(
     66     count: int = -1, **kwargs: Any
     67 ) -> List[earthaccess.results.DataGranule]:
     68     """Search dataset granules using NASA's CMR.
     69 
     70     [https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html)
   (...)
    101         ```
    102     """
--> 103     query = DataGranules().parameters(**kwargs)
    104     granules_found = query.hits()
    105     print(f"Granules found: {granules_found}")

File /usr/local/python-env/OpenClimate/lib/python3.8/site-packages/earthaccess/search.py:347, in DataGranules.parameters(self, **kwargs)
    345         methods[key](*val)
    346     else:
--> 347         methods[key](val)
    349 return self

TypeError: bounding_box() missing 3 required positional arguments: 'lower_left_lat', 'upper_right_lon', and 'upper_right_lat'

arthur-e avatar Aug 15 '23 16:08 arthur-e

Good catch! Looks like we're specifically checking for tuples and splatting them. If we were also splatting lists in the same way, that would solve this issue, but would it possibly introduce another issue? @betolink

https://github.com/nsidc/earthaccess/blob/95253c01f37048f4eea780e08be9d9dca2dd9c0d/earthaccess/search.py#L344C13-L345C35

:question: Shouldn't GitHub be expanding that link to embed? What am I doing wrong? :laughing:

MattF-NSIDC avatar Aug 15 '23 19:08 MattF-NSIDC

It would be useful to accept a list where we currently have tuples, the reason to use tuples is that python_cmr uses them, but lists are more intuitive for sure.

betolink avatar Aug 15 '23 20:08 betolink

We'll move forward with updating https://earthaccess.readthedocs.io/en/latest/user-reference/api/api/#earthaccess.api.search_data for both temporal and bounding box to clarify that tuple is required.

asteiker avatar Feb 06 '24 19:02 asteiker

We also discussed coming back to handling list-like objects with 4 elements at a future time.

mfisher87 avatar Feb 06 '24 19:02 mfisher87

Circling back with some research from another ticket: We can achieve this with Pydantic's runtime validation decorator: https://docs.pydantic.dev/latest/concepts/validation_decorator/#argument-types

mfisher87 avatar Feb 06 '24 21:02 mfisher87

Not sure if this is helpful, but a couple years ago we separated out the temporal and spatial handling in icepyx to their own modules. Each module (here the examples are spatial) parses the user input (e.g. bbox lists, polygons, and shapefiles are accepted for the spatial ones), verifies that it's a valid geometry, and stores it as a list of coordinates. Additional functions make it easy to then return that verified spatial info in whatever format is needed (a geodataframe; a string formatted for cmr (or EGI); a string formatted for another API). I'm not sure if this is a good programmatic approach from a dev perspective, but I'd be happy to brainstorm if this sort of thing would be a good fit for earthaccess to allow users a broader set of input types (and if it makes sense to push these modules upstream).

JessicaS11 avatar Feb 08 '24 19:02 JessicaS11