Harmony errors when no new data are available
The PO.DAAC data subscriber was recently tested at the NSIDC DAAC in order to provide resources to our users as on-prem subscription functionality is being decommissioned. We found a few issues and wanted to pass along some suggested updates. Thank you to @albo8953 for doing this testing!
To test subsetting, we used this example:
podaac-data-subscriber -c ATL03 -d data_sub_test -p NSIDC_CPRD -sd 2025-03-02T23:38:00Z -ed 2025-06-24T00:00:00Z -b="-170,64,-166,67" --subset
which successfully kicked off a Harmony job and downloaded a subsetted .h5. Output was
[2025-06-24 07:58:08,803] {podaac_data_subscriber.py:183} WARNING - No .update__ATL03 in the data directory. (Is this the first run?)
[2025-06-24 07:58:15,532] {subsetting.py:102} INFO - Waiting for Harmony subsetting job to complete...
data_sub_test\102433327_ATL03_20250302234149_11742605_006_01_subsetted.h5
[2025-06-24 08:07:44,795] {podaac_data_subscriber.py:298} INFO - END
However, subsequent runs failed - when the data subscriber finds no new granules, it still tries to submit a Harmony job and you get
[2025-06-24 08:44:05,988] {podaac_data_subscriber.py:236} INFO - 0 new granules found for ATL03 since 2025-06-24T14:37:31Z
[2025-06-24 08:44:05,988] {podaac_access.py:525} INFO - https://cmr.earthdata.nasa.gov/search/collections.umm_json?provider=NSIDC_CPRD&ShortName=ATL03&token=****
[2025-06-24 08:44:08,286] {podaac_access.py:863} INFO - Submitting Harmony subsetting job with parameters {'collection': <harmony.harmony.Collection object at 0x0000025DFFB42DE0>, 'skip_preview': True, 'granule_id': [], 'ignore_errors': True, 'temporal': {'start': datetime.datetime(2025, 3, 2, 23, 38, tzinfo=tzutc()), 'stop': datetime.datetime(2025, 6, 24, 14, 22, 55, tzinfo=tzutc())}, 'spatial': BBox: West:-170.0, South:64.0, East:-166.0, North:67.0}
[2025-06-24 08:44:08,286] {podaac_data_subscriber.py:402} ERROR - Uncaught exception occurred during execution.
Traceback (most recent call last):
File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\subscriber\podaac_data_subscriber.py", line 400, in main
run()
File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\subscriber\podaac_data_subscriber.py", line 264, in run
success_cnt, failure_cnt = subsetting.subset(
^^^^^^^^^^^^^^^^^^
File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\subscriber\subsetting.py", line 82, in subset
job_id = pa.subset(
^^^^^^^^^^
File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\subscriber\podaac_access.py", line 867, in subset
job_id = harmony_client.submit(harmony_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\harmony\harmony.py", line 940, in submit
response = session.send(self._get_prepared_request(request))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\harmony\harmony.py", line 815, in _get_prepared_request
params = self._params(request)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\albo8953\AppData\Local\Programs\Python\Python312\Lib\site-packages\harmony\harmony.py", line 673, in _params
elif type(val) == list and type(val[0]) != str:
~~~^^^
IndexError: list index out of range
We tested the feasibility of a user setting of a cron as well. Putting it on a cron would still work even with the errors. If it didn't find any new data, it would simply fail, but any time new data is found it would kick off a Harmony job and get the new subsetted data. That would look something like this in a user's crontab config:
0 6 * * 1 podaac-data-subscriber -c ATL03 -d OUTPUTDIRECTORY -p NSIDC_CPRD -sd STARTDATE -ed $(date -u +'\%Y-\%m-\%dT\%H:\%M:\%SZ') -b="-170,64,-166,67" --subset > /dev/null 2>&1
Suggested updates:
- Accounting for the possibility of no new data when using the --subset flag so that it doesn't try to send a Harmony request.
- Reconcile the fact that the data subscriber only requires one of the STARTDATE or ENDDATE values, whereas Harmony needs both (that's why the ugly date command is squished in the cron line).