earthaccess icon indicating copy to clipboard operation
earthaccess copied to clipboard

Add `dataset.services()` method to list available Harmony services

Open nikki-t opened this issue 1 year ago • 14 comments

Github Issue: 447

Description

List available Harmony services for a collection. As a first step to facilitating the use of services in earthaccess, earthaccess was modified so that it can list the available services for a collection.

Overview of work done

A new services module was created that performs service queries. The results module was updated to include a services function which will use the services module to query a collection's services and return the provider-id and umm JSON for the service.

Sample services results:

{
    "provider-id": "POCLOUD",
    "umm": {
        "URL": {
            "Description": "the main access point",
            "URLValue": "https://opendap.earthdata.nasa.gov/"
        },
        "Type": "OPeNDAP",
        "ServiceKeywords": [
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "SPATIAL SUBSETTING"
            },
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "TEMPORAL SUBSETTING"
            },
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "VARIABLE SUBSETTING"
            }
        ],
        "ServiceOrganizations": [
            {
                "Roles": [
                    "DEVELOPER"
                ],
                "ShortName": "UCAR/UNIDATA",
                "LongName": "Unidata, University Corporation for Atmospheric Research"
            }
        ],
        "OperationMetadata": [
            {
                "DistributedComputingPlatform": [
                    "WEBSERVICES"
                ]
            }
        ],
        "Description": "Earthdata OPEnDAP in the cloud",
        "Version": "9",
        "Name": "PO.DAAC Cloud OPeNDAP",
        "ServiceOptions": {
            "Subset": {
                "SpatialSubset": {
                    "BoundingBox": {
                        "AllowMultipleValues": false
                    }
                },
                "TemporalSubset": {
                    "AllowMultipleValues": false
                },
                "VariableSubset": {
                    "AllowMultipleValues": true
                }
            },
            "SupportedReformattings": [
                {
                    "SupportedInputFormat": "NETCDF-4",
                    "SupportedOutputFormats": [
                        "ASCII",
                        "CSV",
                        "NETCDF-3",
                        "NETCDF-4"
                    ]
                },
                {
                    "SupportedInputFormat": "HDF5",
                    "SupportedOutputFormats": [
                        "ASCII",
                        "CSV",
                        "NETCDF-3",
                        "NETCDF-4"
                    ]
                }
            ]
        },
        "MetadataSpecification": {
            "URL": "https://cdn.earthdata.nasa.gov/umm/service/v1.5.2",
            "Name": "UMM-S",
            "Version": "1.5.2"
        },
        "LongName": "PO.DAAC OPeNDADP In the Cloud"
    }
}

Overview of verification done

Tested new service functionality on four collections:

  • HLSS30
  • MUR-JPL-L4-GLOB-v4.1
  • VIIRS_NPP-JPL-L2P-v2016.2
  • ATL22
  • TOMSN7SO2

Overview of integration done

Created a new unit test to test DataService get. Unit test coverage:

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Name                                    Stmts   Miss  Cover   Missing
---------------------------------------------------------------------
earthaccess/__init__.py                    31      3    90%   77-81
earthaccess/api.py                        107     77    28%   26-28, 65-78, 117, 124, 143-158, 182-193, 211-213, 234-239, 248-252, 261-265, 284-285, 307-308, 328-336, 345-346, 350-355
earthaccess/auth.py                       207     80    61%   17-18, 49-63, 95-96, 122-154, 188-206, 211-215, 242, 251-252, 260-270, 316-317, 346-358, 362-372, 384
earthaccess/daac.py                        20      6    70%   136, 140-147
earthaccess/formatters.py                  17     10    41%   11, 18, 22-59
earthaccess/kerchunk.py                    32     26    19%   13-22, 32-58
earthaccess/results.py                    152     81    47%   28-31, 34-47, 88-97, 107-109, 117, 124-130, 137-139, 146-148, 155-158, 165-166, 174-176, 184-200, 212-220, 223, 258-261, 268-276, 283-284, 287-290, 306-317, 321-329, 355, 379-380
earthaccess/search.py                     305    139    54%   47, 63-72, 88-89, 99-100, 114, 129-133, 146-150, 165-180, 184-186, 194-195, 203-204, 217, 235-236, 244, 276-312, 374, 392-396, 421, 441-442, 450, 458-464, 474-475, 489-498, 511-516, 525-526, 534-535, 543-544, 552-553, 564-565, 573-574, 582, 588, 628, 634-638, 641, 650, 654, 660-662, 665, 678-679, 721-722, 731-732, 741-742, 772-773, 782-783, 795-803
earthaccess/services.py                    34      6    82%   31, 51, 57-58, 61, 66
earthaccess/store.py                      289    190    34%   27-28, 31, 34, 42, 50-55, 62-75, 82-86, 109-114, 118-121, 124-125, 128-131, 134-137, 148, 156-159, 183-190, 193-194, 214, 224, 239-242, 292, 310-312, 331, 340-385, 394-440, 467-478, 506, 516-536, 546-582, 595-618, 634-649, 656-665
earthaccess/utils/_validation.py            5      3    40%   5-7
earthaccess/widgets.py                      0      0   100%
tests/unit/__init__.py                      0      0   100%
tests/unit/conftest.py                      0      0   100%
tests/unit/test_auth.py                    60      2    97%   133-134
tests/unit/test_collection_queries.py      30      0   100%
tests/unit/test_formatters.py               0      0   100%
tests/unit/test_granule_queries.py         22      0   100%
tests/unit/test_results.py                 22      0   100%
tests/unit/test_store.py                   58      0   100%
---------------------------------------------------------------------
TOTAL                                    1391    623    55%

========================================================================================================= 26 passed, 2 warnings in 1.75s =========================================================================================================
+ bash ./scripts/lint.sh
+ mypy earthaccess --disallow-untyped-defs
Success: no issues found in 12 source files
+ ruff check .

Create a new integration test that tests the services functionality from the service query to the parsing of query results. Integration test coverage:

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Name                                        Stmts   Miss  Cover   Missing
-------------------------------------------------------------------------
earthaccess/__init__.py                        31      4    87%   68-72, 75
earthaccess/api.py                            107     33    69%   77, 124, 144-152, 190-192, 213, 237-238, 248-252, 261-265, 284-285, 331-334, 336, 345-346
earthaccess/auth.py                           207     59    71%   17-18, 122-154, 188-206, 211-212, 242, 251-252, 260-261, 266, 269, 346-358, 362-372, 384
earthaccess/daac.py                            20      2    90%   136, 147
earthaccess/formatters.py                      17     10    41%   11, 18, 22-59
earthaccess/kerchunk.py                        32     26    19%   13-22, 32-58
earthaccess/results.py                        152     58    62%   28-31, 34-47, 88-97, 107-109, 124-130, 137-139, 146-148, 155-158, 165-166, 174-176, 193, 223, 258-261, 268-276, 283-284, 287-290, 316-317, 321-329, 355, 379-380
earthaccess/search.py                         305     81    73%   69-70, 88-89, 114, 129-133, 146-150, 172, 184-186, 194-195, 203-204, 217, 235-236, 284, 290-294, 297, 304, 392-396, 421, 441-442, 459, 474-475, 490, 511-516, 525-526, 543-544, 552-553, 574, 582, 628, 634-638, 641, 650, 662, 678-679, 721-722, 731-732, 741-742, 772-773, 782-783, 795-803
earthaccess/services.py                        34      6    82%   31, 51, 57-58, 61, 66
earthaccess/store.py                          289     89    69%   34, 42, 62-75, 109-114, 118-121, 124-125, 128-131, 159, 183-190, 193-194, 214, 224, 241-242, 312, 331, 345, 373-374, 383-385, 394-440, 468-470, 478, 506, 523-536, 596, 612-617, 635, 637, 660-661
earthaccess/utils/_validation.py                5      0   100%
earthaccess/widgets.py                          0      0   100%
tests/integration/conftest.py                   9      5    44%   8-12
tests/integration/test_api.py                  51      0   100%
tests/integration/test_auth.py                 82     15    82%   56, 64-65, 87-95, 106, 117-118
tests/integration/test_cloud_download.py       80     13    84%   88-89, 130-133, 141-142, 147-156, 169
tests/integration/test_cloud_open.py           82      5    94%   101, 134-135, 159, 169
tests/integration/test_kerchunk.py             41     33    20%   11-87
tests/integration/test_onprem_download.py      82      5    94%   94, 127-128, 159, 161
tests/integration/test_onprem_open.py          75      4    95%   93, 126-127, 151
tests/integration/test_services.py             24      0   100%
-------------------------------------------------------------------------
TOTAL                                        1725    448    74%

================================================================================================================= short test summary info ==================================================================================================================
FAILED tests/integration/test_api.py::test_download[True-0] - ValueError: earthaccess can't yet guess the provider for cloud collections, we need to use one from earthaccess.list_cloud_providers()
FAILED tests/integration/test_api.py::test_download[True-selection1] - ValueError: earthaccess can't yet guess the provider for cloud collections, we need to use one from earthaccess.list_cloud_providers()
FAILED tests/integration/test_auth.py::test_auth_can_read_from_netrc_file - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_populates_attrs - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_can_fetch_s3_credentials - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location0] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location1] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location0] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location1] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac0] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac1] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac2] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac3] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac4] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_multi_file_granule - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_cloud_open.py::test_multi_file_granule - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac0] - AssertionError: False is not true
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac1] - AssertionError: False is not true
============================================================================================= 18 failed, 52 passed, 1 skipped, 3 warnings in 174.40s (0:02:54) =============================================================================================

PR checklist:

  • [X] Linted & Formatted
  • [X] Updated unit & integration tests
  • [X] Updated changelog & readme
  • [X] Updated documentation

Pending:

  • [ ] Pass currently failing integration tests.

📚 Documentation preview 📚: https://earthaccess--500.org.readthedocs.build/en/500/

nikki-t avatar Mar 25 '24 15:03 nikki-t

It seems like the existing integration tests are failing due to credentials which I had set up as environment variables and in a .netrc file when running them. What is the best way to handle credentials in these tests? Also happy to move this to a draft if we want to work out those details first.

nikki-t avatar Mar 25 '24 15:03 nikki-t

Hope you don't mind the rename, was finding the old name difficult to remember in my notifications :)

mfisher87 avatar Mar 25 '24 15:03 mfisher87

What is the best way to handle credentials in these tests?

@betolink @andypbarrett I think this may need documenting! As a developer, how do I run integration tests on my laptop?

mfisher87 avatar Mar 25 '24 15:03 mfisher87

@mfisher87 - I think that I was able to fix the integration test and also modified the services unit and integration tests to use VCR. All unit tests are passing.

I ran the integration tests locally and here is the summary:

==================== short test summary info =================================

FAILED tests/integration/test_auth.py::test_auth_can_read_from_netrc_file - AssertionError: False is not true FAILED tests/integration/test_auth.py::test_auth_populates_attrs - AssertionError: False is not true FAILED tests/integration/test_auth.py::test_auth_can_fetch_s3_credentials - AssertionError: False is not true FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location0] - assert {} FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location1] - assert {} FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location0] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0) FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location1] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

====== 7 failed, 71 passed, 1 skipped in 496.39s (0:08:16) ==========================================

I wonder if these tests need to be run in the same AWS region as the data?

Other than that I think this PR can be merged but let me know how that typically works for earthaccess development.

nikki-t avatar Apr 26 '24 20:04 nikki-t

Will you be at hack day this coming week? I may not have time to review this before then :grimacing:

mfisher87 avatar Apr 27 '24 01:04 mfisher87

@mfisher87 - Unfortunately I am on travel this coming week so I can't make it but I do plan to attend the next earthaccess hack day in a few weeks. We can discuss then if it's easier! 😄

nikki-t avatar Apr 28 '24 17:04 nikki-t

Sounds like a good plan :) Safe and fun travels!

mfisher87 avatar Apr 28 '24 18:04 mfisher87

@nikki-t I'll only attend 2nd half of hack day this week, had to schedule a dentist appointment at that time. I may be co-working on that call (I've used my funded Openscapes allocation), but feel free to interrupt me so we can have a quick chat. I love the how-to you added :star_struck:

mfisher87 avatar May 14 '24 15:05 mfisher87

@JessicaS11 - Thank you for the suggested updates, I have applied them!

@mfisher87 , @betolink - Do you think this PR is ready to be merged?

nikki-t avatar Jun 03 '24 18:06 nikki-t

Hi @nikki-t the code looks good to me, there are some failing tests due a print statement which I'm not opposed to, @mfisher87do you know if there is a way to skip this rule on a file?

betolink avatar Jun 03 '24 19:06 betolink

Hi @nikki-t the code looks good to me, there are some failing tests due a print statement which I'm not opposed to, @mfisher87do you know if there is a way to skip this rule on a file?

Please use logging, not printing. We addressed this with https://github.com/nsidc/earthaccess/issues/511.

chuckwondo avatar Jun 03 '24 19:06 chuckwondo

Ahh right @chuckwondo cc @nikki-t

betolink avatar Jun 03 '24 19:06 betolink

Suggested modifications have been made and unit and integration tests have been updated.

Unit Test Summary

================================================================================================================ short test summary info =================================================================================================================
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2001-12-12-2001-12-21-2001-12-12T00:00:00Z,2001-12-21T23:59:59Z] - ValueError: time data '2001-12-12' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2021-02-01--2021-02-01T00:00:00Z,] - ValueError: time data '2021-02-01' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[1999-02-01 06:00-2009-01-01-1999-02-01T06:00:00Z,2009-01-01T23:59:59Z] - ValueError: time data '1999-02-01 06:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2019-03-10T00:00:00Z-2019-03-10T00:00:00-01:00-2019-03-10T00:00:00Z,2019-03-10T01:00:00Z] - ValueError: time data '2019-03-10T00:00:00-01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2001-12-12-2001-12-21-2001-12-12T00:00:00Z,2001-12-21T23:59:59Z] - ValueError: time data '2001-12-12' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2021-02-01--2021-02-01T00:00:00Z,] - ValueError: time data '2021-02-01' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[1999-02-01 06:00-2009-01-01-1999-02-01T06:00:00Z,2009-01-01T23:59:59Z] - ValueError: time data '1999-02-01 06:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2019-03-10T00:00:00Z-2019-03-10T00:00:00-01:00-2019-03-10T00:00:00Z,2019-03-10T01:00:00Z] - ValueError: time data '2019-03-10T00:00:00-01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_results.py::TestResults::test_data_links - ValueError: time data '2020' does not match format '%Y-%m-%dT%H:%M:%SZ'
============================================================================================================== 9 failed, 31 passed in 2.56s ==============================================================================================================

Integration Test Summary

================================================================================================================ short test summary info =================================================================================================================
FAILED tests/integration/test_auth.py::test_auth_can_read_earthdata_env_variables - AttributeError: 'Auth' object has no attribute 'username'
FAILED tests/integration/test_auth.py::test_auth_can_read_from_netrc_file - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_populates_attrs - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_can_fetch_s3_credentials - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location0] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location1] - assert {}
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac1] - AssertionError: False is not true
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac3] - AssertionError: 0 not greater than 3
FAILED tests/integration/test_onprem_open.py::test_earthaccess_can_open_onprem_collection_granules[daac3] - AssertionError: 0 not greater than 2
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2001-12-12-2001-12-21-2001-12-12T00:00:00Z,2001-12-21T23:59:59Z] - ValueError: time data '2001-12-12' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2021-02-01--2021-02-01T00:00:00Z,] - ValueError: time data '2021-02-01' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[1999-02-01 06:00-2009-01-01-1999-02-01T06:00:00Z,2009-01-01T23:59:59Z] - ValueError: time data '1999-02-01 06:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2019-03-10T00:00:00Z-2019-03-10T00:00:00-01:00-2019-03-10T00:00:00Z,2019-03-10T01:00:00Z] - ValueError: time data '2019-03-10T00:00:00-01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2001-12-12-2001-12-21-2001-12-12T00:00:00Z,2001-12-21T23:59:59Z] - ValueError: time data '2001-12-12' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2021-02-01--2021-02-01T00:00:00Z,] - ValueError: time data '2021-02-01' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[1999-02-01 06:00-2009-01-01-1999-02-01T06:00:00Z,2009-01-01T23:59:59Z] - ValueError: time data '1999-02-01 06:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2019-03-10T00:00:00Z-2019-03-10T00:00:00-01:00-2019-03-10T00:00:00Z,2019-03-10T01:00:00Z] - ValueError: time data '2019-03-10T00:00:00-01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_results.py::TestResults::test_data_links - ValueError: time data '2020' does not match format '%Y-%m-%dT%H:%M:%SZ'
============================================================================================ 18 failed, 66 passed, 1 skipped, 1 warning in 596.56s (0:09:56) =============================================================================================

It looks like a lot of the tests are failing due to the date format. I pulled in the most recent changes from main to see if they had been fixed but it doesn't look like. I also typically see some test failures when running the integration tests locally.

@mfisher87 or @chuckwondo - Do you mind approving the workflows so that they can run? If these pass I think we might be in good shape to merge the PR.

nikki-t avatar Jun 11 '24 15:06 nikki-t

Workflows kicked off! You should not have to worry about that... I sent you an invitation with "triage" rights (includes ability to add labels, close issues, etc., but not quite as much as "maintainer") you can accept or decline :)

mfisher87 avatar Jun 11 '24 16:06 mfisher87

If this isn't merged before next hack day, let's do it then? How are you feeling @chuckwondo ?

mfisher87 avatar Jul 17 '24 23:07 mfisher87