data-subscriber icon indicating copy to clipboard operation
data-subscriber copied to clipboard

Harmony errors when no new data are available

Open asteiker opened this issue 5 months ago • 0 comments

The PO.DAAC data subscriber was recently tested at the NSIDC DAAC in order to provide resources to our users as on-prem subscription functionality is being decommissioned. We found a few issues and wanted to pass along some suggested updates. Thank you to @albo8953 for doing this testing!

To test subsetting, we used this example:

podaac-data-subscriber -c ATL03 -d data_sub_test -p NSIDC_CPRD -sd 2025-03-02T23:38:00Z -ed 2025-06-24T00:00:00Z -b="-170,64,-166,67" --subset

which successfully kicked off a Harmony job and downloaded a subsetted .h5. Output was

[2025-06-24 07:58:08,803] {podaac_data_subscriber.py:183} WARNING - No .update__ATL03 in the data directory. (Is this the first run?)
[2025-06-24 07:58:15,532] {subsetting.py:102} INFO - Waiting for Harmony subsetting job to complete...
data_sub_test\102433327_ATL03_20250302234149_11742605_006_01_subsetted.h5
[2025-06-24 08:07:44,795] {podaac_data_subscriber.py:298} INFO - END

However, subsequent runs failed - when the data subscriber finds no new granules, it still tries to submit a Harmony job and you get

[2025-06-24 08:44:05,988] {podaac_data_subscriber.py:236} INFO - 0 new granules found for ATL03 since 2025-06-24T14:37:31Z
[2025-06-24 08:44:05,988] {podaac_access.py:525} INFO - https://cmr.earthdata.nasa.gov/search/collections.umm_json?provider=NSIDC_CPRD&ShortName=ATL03&token=****
[2025-06-24 08:44:08,286] {podaac_access.py:863} INFO - Submitting Harmony subsetting job with parameters {'collection': <harmony.harmony.Collection object at 0x0000025DFFB42DE0>, 'skip_preview': True, 'granule_id': [], 'ignore_errors': True, 'temporal': {'start': datetime.datetime(2025, 3, 2, 23, 38, tzinfo=tzutc()), 'stop': datetime.datetime(2025, 6, 24, 14, 22, 55, tzinfo=tzutc())}, 'spatial': BBox: West:-170.0, South:64.0, East:-166.0, North:67.0}
[2025-06-24 08:44:08,286] {podaac_data_subscriber.py:402} ERROR - Uncaught exception occurred during execution.
Traceback (most recent call last):
  File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\subscriber\podaac_data_subscriber.py", line 400, in main
    run()
  File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\subscriber\podaac_data_subscriber.py", line 264, in run
    success_cnt, failure_cnt = subsetting.subset(
                               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\subscriber\subsetting.py", line 82, in subset
    job_id = pa.subset(
             ^^^^^^^^^^
  File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\subscriber\podaac_access.py", line 867, in subset
    job_id = harmony_client.submit(harmony_request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\harmony\harmony.py", line 940, in submit
    response = session.send(self._get_prepared_request(request))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\harmony\harmony.py", line 815, in _get_prepared_request
    params = self._params(request)
             ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\harmony\harmony.py", line 673, in _params
    elif type(val) == list and type(val[0]) != str:
                                    ~~~^^^
IndexError: list index out of range

We tested the feasibility of a user setting of a cron as well. Putting it on a cron would still work even with the errors. If it didn't find any new data, it would simply fail, but any time new data is found it would kick off a Harmony job and get the new subsetted data. That would look something like this in a user's crontab config:

0 6 * * 1 podaac-data-subscriber -c ATL03 -d OUTPUTDIRECTORY -p NSIDC_CPRD -sd STARTDATE -ed $(date -u +'\%Y-\%m-\%dT\%H:\%M:\%SZ') -b="-170,64,-166,67" --subset > /dev/null 2>&1

Suggested updates:

  1. Accounting for the possibility of no new data when using the --subset flag so that it doesn't try to send a Harmony request.
  2. Reconcile the fact that the data subscriber only requires one of the STARTDATE or ENDDATE values, whereas Harmony needs both (that's why the ugly date command is squished in the cron line).

asteiker avatar Jul 17 '25 16:07 asteiker