MetPy icon indicating copy to clipboard operation
MetPy copied to clipboard

Coordinate identification for 2d latitude/longitude fails in presence of similarly named auxilliary variables

Open ljoakim opened this issue 8 months ago • 4 comments

What went wrong?

I work with data that have x, y dimension coordinates as well as auxiliary 2d latitude and longitude coordinates. In the data arrays, there are additional auxiliary variables that conflict with the latitude/longitude coordinates when trying to use the accessor, e.g. da.metpy.latitude.

This seems to stem from the non-CF failsafe regex fallback in the check_axis function, when called from _generate_coordinate_map(), which marks all auxiliary variables with names starting with e.g. y, x, lat or lon as candidates for the accessor.

For y and x, this is resolved later because they match dimension names, but not for latitude and longitude, which causes the access to fail. Not sure precisely how to fix it, but would it maybe be possible to defer the regex fallback and use it only if no "CF-compliant" coordinate were found?

I've created a pytest test that reproduces the issue.

Thanks,

Operating System

Linux

Version

1.7.0.post81+g9f59838062

Python Version

3.13.3

Code to Reproduce

# Added in tests/test_xarray.py
#
def test_coord_identification_with_other_namelike_vars(test_var_multidim_full):
    """Test coord identification works in presence of vars with similar names."""
    test_da = test_var_multidim_full.assign_coords(
        ylike=('y', [0, 0]),
        latlike=(('y', 'x'), [[0, 0], [0, 0]]),
        xlike=('x', [0, 0]),
        lonlike=(('y', 'x'), [[0, 0], [0, 0]]),
    )
    assert test_da['y'].identical(test_da.metpy.y)
    assert test_da['lat'].identical(test_da.metpy.latitude)
    assert test_da['x'].identical(test_da.metpy.x)
    assert test_da['lon'].identical(test_da.metpy.longitude)

Errors, Traceback, and Logs

src/metpy/_warnings.py:29: UserWarning
___________ test_coord_identification_with_other_namelike_vars[all] ____________

test_var_multidim_full = <xarray.DataArray 'Temperature' (isobaric: 2, y: 2, x: 2)> Size: 32B
[8 values with dtype=float32]
Coordinates:
    ti...e:  Initialized analysis product
    GRIB_level_type:               100
    GRIB_VectorComponentFlag:      gridRelative

    def test_coord_identification_with_other_namelike_vars(test_var_multidim_full):
        """Test coord identification works in presence of vars with similar names."""
        test_da = test_var_multidim_full.assign_coords(
            ylike=('y', [0, 0]),
            latlike=(('y', 'x'), [[0, 0], [0, 0]]),
            xlike=('x', [0, 0]),
            lonlike=(('y', 'x'), [[0, 0], [0, 0]]),
        )
>       assert test_da['y'].identical(test_da.metpy.y)
                                      ^^^^^^^^^^^^^^^

tests/test_xarray.py:1521: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/metpy/xarray.py:492: in y
    return self._axis('y')
           ^^^^^^^^^^^^^^^
src/metpy/xarray.py:417: in _axis
    coord_var = self._metpy_axis_search(axis)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
src/metpy/xarray.py:400: in _metpy_axis_search
    coord_map = self._generate_coordinate_map()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
src/metpy/xarray.py:365: in _generate_coordinate_map
    self._resolve_axis_duplicates(axis, coord_lists)
src/metpy/xarray.py:383: in _resolve_axis_duplicates
    _warnings.warn(f'More than one {axis} coordinate present for variable {varname}.')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = ('More than one longitude coordinate present for variable  "Temperature".',)
kwargs = {}, level = 7

    def warn(*args, **kwargs):
        """Wrap `warnings.warn` and automatically set the stack level if not given."""
        level = kwargs.get('stacklevel')
        if level is None:
            level = _find_stack_level()
>       warnings.warn(*args, **kwargs, stacklevel=level)
E       UserWarning: More than one longitude coordinate present for variable  "Temperature".

src/metpy/_warnings.py:29: UserWarning
=========================== short test summary info ============================
FAILED tests/test_xarray.py::test_coord_identification_with_other_namelike_vars[True]
FAILED tests/test_xarray.py::test_coord_identification_with_other_namelike_vars[all]
============================== 2 failed in 0.51s ===============================

ljoakim avatar Jul 02 '25 09:07 ljoakim

Thanks for the well-described issue. Can you share with us some examples of these real-world auxiliary coordinates that are getting caught by check_axis? That could motivate how we might narrow our regex or avoid the fallback if we decide to.

Either way, you can use .metpy.assign_coordinates() to parse your coords explicitly, eg adding

test_da = test_da.metpy.assign_coordinates(
    {'time': test_da.time,
     'isobaric': test_da.isobaric,
     'y': test_da.y,
     'x': test_da.x,
     'latitude': test_da.lat,
     'longitude': test_da.lon,}
)

to your test will allow it to pass. Hope this helps, and thanks again for the issue.

dcamron avatar Jul 02 '25 19:07 dcamron

Thanks for the response and the suggestion!

Some additional information:

I'm working with regional climate data, CORDEX and CORDEX-CMIP6 (there are some x/y/lat/lon data examples in the documents). The datasets generally use some kind of projection/grid mapping such as "rotated_pole" or "lambert_conformal_conic" and add the 2d latitude and longitude variables as auxiliary coordinates. The auxiliary coordinates have the "standard_name" and "units" attributes properly set. With the metpy accessor I can easily access them, which is great!

The situation I'm currently in is specific for a processing pipeline I'm doing some work in, but I think it is a situation that may occur in other cases. I get dataset files that have been preprocessed, with some auxiliary coords added. One such auxiliary coord is a string variable named "longseason" (1d in dimension time, used for season labelling). It has no attributes to suggest it is a lat/lon coordinate other than its name. Because the name starts with "lon", it is added as a valid candidate to the list of "longitude" coordinate candidates.

I think covering all possible situations with the regex would be difficult, but possibly it could be handled by somehow giving the variables/coords that fulfil the CF/THREDDS-conditions a higher priority. I tested adding a flag to check_axis like so:

def check_axis(var, *axes, strict=False):

where the fallback is used when strict=False, and from _generate_coordinate_map I call it with check_axis(coord_var, axis, strict=True) when building the coord_lists. Then again, I assume the fallback is there for a reason, and I'm missing something. Maybe the coord_lists that are empty at this point could make another run of calls to check_axis with non-strict check?

This is not a critical issue, it could be argued that such auxiliary data should be kept separately, or named differently, but it would be nice to have it work "seamlessly" for cases like this as well, if possible.

Again, thanks for the effort, I really like the MetPy package.

ljoakim avatar Jul 03 '25 09:07 ljoakim

I made another test to reflect my case more precisely (using isobaric dimension in place of time in my case):

def test_coord_identification_with_lonlike_string_aux(test_var_multidim_full):
    test_da = test_var_multidim_full.assign_coords(
        longseason=('isobaric', ['DJF', 'MAM']),
    )
    assert test_da['lon'].identical(test_da.metpy.longitude)

As a note, looking at it, this particular issue would be resolved by making sure the variable is of numeric type, before using the fallback in check_axis.

ljoakim avatar Jul 03 '25 11:07 ljoakim

I haven't looked at the code but I think there are two tweaks we should make here:

  1. Ensure numeric variables when considering coords (at least lat/lon)
  2. Tweak the regex/heuristic to be more specific so that "longseason" isn't considered for longitude

dopplershift avatar Jul 03 '25 16:07 dopplershift