GempakSounding.snxarray fails to return list of soundings
What went wrong?
Reading this sounding file: https://github.com/wx4stg/LRC/blob/main/data/240713.ua
with GempakSounding and attempting to call snxarray() on it gives an error
Operating System
Linux
Version
1.6.2
Python Version
3.12
Code to Reproduce
from metpy.io import GempakSounding
gs = GempakSounding('240713.ua')
gs.snxarray()
Errors, Traceback, and Logs
Cell In[21], line 1
----> 1 gs.snxarray()
File ~/micromamba/envs/LRC/lib/python3.12/site-packages/metpy/io/gempak.py:2196, in GempakSounding.snxarray(self, station_id, station_number, date_time, state, country)
2192 raise KeyError('No stations were matched with given parameters.')
2194 sndno = [(s.DTNO, s.SNDNO) for s in matched]
-> 2196 data = self._unpack_merged(sndno) if self.merged else self._unpack_unmerged(sndno)
2198 soundings = []
2199 for snd in data:
File ~/micromamba/envs/LRC/lib/python3.12/site-packages/metpy/io/gempak.py:1554, in GempakSounding._unpack_unmerged(self, sndno)
1549 for iprm, param in enumerate(parameters['name']):
1550 sounding[part.name][param] = (
1551 np.array(packed_buffer[iprm::nparms], dtype=np.float32)
1552 )
-> 1554 soundings.append(self._merge_sounding(sounding))
1555 return soundings
File ~/micromamba/envs/LRC/lib/python3.12/site-packages/metpy/io/gempak.py:2066, in GempakSounding._merge_sounding(self, parts)
2063 _interp_logp_pressure(merged, self.prod_desc.missing_float)
2065 # Interpolate missing data
-> 2066 _interp_logp_data(merged, self.prod_desc.missing_float)
2068 # Add below ground MAN data
2069 if merged['PRES'][0] != self.prod_desc.missing_float and bgl > 0:
File ~/micromamba/envs/LRC/lib/python3.12/site-packages/metpy/io/gempak.py:241, in _interp_logp_data(sounding, missing)
239 bdata[param] = val[iabove]
240 vlev = sounding['PRES'][i]
--> 241 outdata = _interp_parameters(vlev, adata, bdata, missing)
242 sounding[var1][i] = outdata[var1]
243 if var2 is not None:
File ~/micromamba/envs/LRC/lib/python3.12/site-packages/metpy/io/gempak.py:458, in _interp_parameters(vlev, adata, bdata, missing)
452 between = (((pres1 < pres2) and (pres1 < vlev)
453 and (vlev < pres2))
454 or ((pres2 < pres1) and (pres2 < vlev)
455 and (vlev < pres1)))
457 if not between:
--> 458 raise ValueError('Current pressure does not fall between levels.')
459 elif pres1 <= 0 or pres2 <= 0:
460 raise ValueError('Pressure cannot be negative.')
ValueError: Current pressure does not fall between levels.
I'll debug this later...
It took some time and effort, but I believe I know what is happening. snxarray() fails on the first sounding it finds with an issue. Since these are observed soundings, it is the sounding merging process that fails. The particular sounding in this case is the 12Z sonde from the station NZWP (Whenuapai, New Zealand). Running the file through GEMPAK snlist and processing the data for that time/location, things work just fine. At first I thought there might be some issues with the interpolation that occurs, but further investigation revealed the issue stemmed from the how merging of the WMO encoded groups was implemented.
This particular sounding had a PPCC (Mandatory winds above 100 mb) group, but did not have a TTCC (mandatory data above 100 mb) group. The current implementation of the _merge_sounding() method will insert the missing levels from the PPCC group (the same is done for the PPAA group). GEMPAK, on the other hand, will only merge data from the PPAA/PPCC groups if the levels in those groups already exist within the sounding data structure (coming from the TTAA/TTCC groups, respectively). Some upper air stations only report mandatory wind data. In that case, GEMPAK does not attempt to do interpolation on those levels without temperature/dewpoint data.
The result of all this was that you have missing height data where you would otherwise not (i.e., interp_moist_height() does not have temp/dewpoint data on a few MAN levels). Interpolation then fails due to using bisect_left() with the missing data (-9999) and you get the wrong index for insertion. Whether the use of bisect_left() has a small part to play in the bug is not entirely known. I suspect that it is just not extremely common to find upper air data with a PPCC group but not a TTCC group. Ultimately, adapting the MAN wind merging code to behave like GEMPAK does allows the file to process properly. That would suggest that the usage of bisect_left() works fine as long as the portions of the merging code behave as they should.
In my testing so far, I can read the problem file attached to this issue and all tests still pass. I can draw up a PR to fix this issue fairly quickly. The only question I have before moving forward with that is whether or not to implement any error handling that would warn users of a problematic station within a file but still allow processing of other data. My vote would be to keep error handling as is and just fix as needed. Let me know if you think otherwise.