cmr-stac icon indicating copy to clipboard operation
cmr-stac copied to clipboard

some /search GET requests are broken

Open hrodmn opened this issue 1 year ago • 1 comments

It appears that there is a problem in the API when it comes to parsing GET request parameters.

Related issue: #348

Working - this request returns an item collection with one item and shows that there are 30 results for the query:

curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections=HLSL30_2.0" | jq

Not working - an almost identical request but with two collections separated by commas returns an empty item collection and shows that there are zero results for the query:

curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections=HLSL30_2.0,HLSS30_2.0"

I don't know exactly what is going on here but it seems like the collections values are not getting correctly parsed as a comma-separated list.

POST requests are working normally:

Working - a POST request with the same parameters as the failing GET request:

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"bbox": [-105.55, 35.64, -105.31, 35.81], "datetime": "2024-01-01T00:00:00Z/2024-09-01T00:00:00Z", "collections": ["HLSL30_2.0", "HLSS30_2.0"], "limit": 1}' \
  "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search" 

But a GET request to the link ("rel": "next") to the next page from the result of the previous request returns an empty item collection:

curl -X GET \
"https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55%2C35.64%2C-105.31%2C35.81&datetime=2024-01-01T00%3A00%3A00Z%2F2024-09-01T00%3A00%3A00Z&collections=HLSL30_2.0%2CHLSS30_2.0&limit=1&cursor=eyJqc29uIjoiW1wibHBjbG91ZFwiLDE3MDQzMDQ0NDgwMzQsMjgzMjIzMjQwNl0iLCJ1bW0iOiJbXCJscGNsb3VkXCIsMTcwNDMwNDQ0ODAzNCwyODMyMjMyNDA2XSJ9"

This is a big problem for clients that page through results using the "rel": "next" links (like pystac_client) which are able to get the first page of results via a POST request but are failing to retrieve all of the results because the GET requests for the paged results are failing!

hrodmn avatar Sep 24 '24 16:09 hrodmn

Having the same problem I think, opened an issue over here, but guess this repository is more relevant.

bertcoerver avatar Sep 25 '24 08:09 bertcoerver

@hrodmn -- pagination looks back online (#348) ....does that fix this issue up too?

ircwaves avatar Sep 26 '24 20:09 ircwaves

Looks like this issue is with the collections parameter. It doesn't like the comma separated format. It may work with two "collections[]" parameters. https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections[]=HLSL30_2.0&collections[]=HLSS30_2.0

But the links generated inside the request will have the comma separated problem again. I can write up a ticket for this.

william-valencia avatar Sep 26 '24 20:09 william-valencia

Try your get request with two "collections" parameters.

When I tried this the other day I would only get results for the last collection that I provided. I'm not at my computer now so I can't test it, will try again later.

hrodmn avatar Sep 26 '24 20:09 hrodmn

The comma-separated lists collections indeed seem to be causing the problem here.

If I go one at a time, I can see that there are 82 total matches between the two collections:

HLSL30_2.0

$ curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections=HLSL30_2.0" | jq .context
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7855  100  7855    0     0   3785      0  0:00:02  0:00:02 --:--:--  3785
{
  "returned": 1,
  "limit": 1,
  "matched": 30
}

HLSS30_2.0

$ curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections=HLSS30_2.0" | jq .context
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8707  100  8707    0     0   7257      0  0:00:01  0:00:01 --:--:--  7261
{
  "returned": 1,
  "limit": 1,
  "matched": 52
}

Both HLSL30_2.0 and HLSS30_2.0 This request should return 82 matches but we get 0:

$ curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections=HLSL30_2.0,HLSS30_2.0" | jq .context
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   884  100   884    0     0   1060      0 --:--:-- --:--:-- --:--:--  1059
{
  "returned": 0,
  "limit": 1,
  "matched": 0
}

If I send two separate collections values, I only get results for the last one that I sent (HLSL30_2.0):

$ curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections=HLSS30_2.0&collections=HLSL30_2.0" | jq .context
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7855  100  7855    0     0   5331      0  0:00:01  0:00:01 --:--:--  5332
{
  "returned": 1,
  "limit": 1,
  "matched": 30
}

Same result for two collections[] parameters:

$ curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections[]=HLSS30_2.0&collections[]=HLSL30_2.0" | jq .context
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7861  100  7861    0     0   4998      0  0:00:01  0:00:01 --:--:--  5000
{
  "returned": 1,
  "limit": 1,
  "matched": 30
}

GET requests for paginated results are working for requests that only ask for a single collection:

Initial request:

$ curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections=HLSS30_2.0" | jq .features[0].id
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8707  100  8707    0     0   7923      0  0:00:01  0:00:01 --:--:--  7929
"HLS.S30.T13SDV.2024003T174731.v2.0"

"rel" = "next" link:

$ curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55,35.64,-105.31,35.81&datetime=2024-01-01T00:00:00Z/2024-09-24T00:00:00Z&limit=1&collections=HLSS30_2.0" | jq .links[-1].href
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8707  100  8707    0     0   5980      0  0:00:01  0:00:01 --:--:--  5980
"https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55%2C35.64%2C-105.31%2C35.81&collections=HLSS30_2.0&datetime=2024-01-01T00%3A00%3A00Z%2F2024-09-24T00%3A00%3A00Z&limit=1&cursor=eyJqc29uIjoiW1wibHBjbG91ZFwiLDE3MDQzMDQ0NDgwMzQsMjgzMjIzMjQwNl0iLCJ1bW0iOiJbXCJscGNsb3VkXCIsMTcwNDMwNDQ0ODAzNCwyODMyMjMyNDA2XSJ9"

$ curl -X GET "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bbox=-105.55%2C35.64%2C-105.31%2C35.81&collections=HLSS30_2.0&datetime=2024-01-01T00%3A00%3A00Z%2F2024-09-24T00%3A00%3A00Z&limit=1&cursor=eyJqc29uIjoiW1wibHBjbG91ZFwiLDE3MDQzMDQ0NDgwMzQsMjgzMjIzMjQwNl0iLCJ1bW0iOiJbXCJscGNsb3VkXCIsMTcwNDMwNDQ0ODAzNCwyODMyMjMyNDA2XSJ9" | jq .features[0].id
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  8844  100  8844    0     0   7719      0  0:00:01  0:00:01 --:--:--  7717
"HLS.S30.T13SDV.2024008T174719.v2.0"

hrodmn avatar Sep 27 '24 01:09 hrodmn

Has anyone had time to investigate this issue? It's a silent error for users that are requesting items from multiple collections.

hrodmn avatar Oct 03 '24 13:10 hrodmn

This issue is scheduled to be fixed in the next sprint. The ticket is https://bugs.earthdata.nasa.gov/browse/CMR-10186

william-valencia avatar Oct 03 '24 20:10 william-valencia

This issue is scheduled to be fixed in the next sprint. The ticket is https://bugs.earthdata.nasa.gov/browse/CMR-10186

Thanks @william-valencia! Unfortunately that ticket is Access Denied. Glad the fix is in.

ircwaves avatar Oct 03 '24 20:10 ircwaves

Fix has been merged and is not in SIT. It will go through the normal deployment process to get to UAT and PROD.

william-valencia avatar Oct 09 '24 21:10 william-valencia

Fix has been merged and is not in SIT. It will go through the normal deployment process to get to UAT and PROD.

Thank you for the fix! Do you have an estimate of when the change will make it to PROD? I don't know how the deploy process looks in NASA CMR.

hrodmn avatar Oct 10 '24 14:10 hrodmn

resolved by #357

hrodmn avatar Oct 10 '24 14:10 hrodmn