climpred icon indicating copy to clipboard operation
climpred copied to clipboard

Explicit check for matching coords

Open bradyrx opened this issue 5 years ago • 8 comments

This has been an issue on my mind for awhile. But we need to have something in checks.py that acts as a decorator to check that the input Datasets/DataArrays being compared have the same coordinate labeling (outside of time).

For instance,

import climpred
hind = climpred.tutorial.load_dataset('CESM-DP-SST-3D')
verif = climpred.tutorial.load_dataset('FOSI-SST-3D')

# Add nlat/nlon coordinates only on one.
h['nlat'] = np.arange(h.nlat.size)
h['nlon'] = np.arange(h.nlon.size)

climpred.prediction.compute_hindcast(h, v, max_dof=True)

This returns

ValueError: indexes along dimension 'nlat' are not equal

I've encountered this a lot when something I'm comparing to drops coords along the way or doesn't have them to begin with. And it makes you think something in climpred is broken during alignment, like we're losing spatial cells. In reality it's just that one has a coordinate for nlat and one doesn't.

bradyrx avatar Jan 27 '20 01:01 bradyrx

Isn’t the current return message from xr self-explanatory for the problem?

aaronspring avatar Jan 27 '20 09:01 aaronspring

I wouldn't say so. At first glance I would think it means that 'nlat' between forecast and verif are not of equal length. It's sort of vague... it would make way more sense if it said something like

ValueError: Coordinates do not match along dimension 'nlat'

bradyrx avatar Jan 27 '20 15:01 bradyrx

this is true: ValueError: Coordinates do not match along dimension 'nlat' but the xr check is also true. what I want to emphazise is that we dont need to write error messages for all possible errors: i would focus on those where the problem is hidden and not mentioned directly by the error message.

here climpred did nothing to the dimension nlat. if the problem would arise on lead, time, or init we should definately explain this. but in a way, nlat is just an additional dimension which climpred didnt touch. maybe a more explicit error message could be forecast and reference differ in coordinate 'nat'

aaronspring avatar Jan 27 '20 18:01 aaronspring

if we find matching coords, we could show those as coords of all in repr and leave them out in the individual dataset reprs

aaronspring avatar Jun 05 '20 12:06 aaronspring

Agreed. Maybe upon instantiating the object we also just add coords for all dims that exist on other objects so we don't have to worry about that error.

bradyrx avatar Jun 13 '20 19:06 bradyrx

maybe we can just throw a warning in the case of coords (which are also dimensions) mismatch. singular coords like member_id='r1...' would be safe to ignore

aaronspring avatar Jul 25 '20 14:07 aaronspring

Or we don’t do this coords checking (only do it for time to check calendar) because then we could use xesmf regridding to regrid different datasets to 1x1 or 5x5 der grids which then should match

aaronspring avatar Jul 27 '20 10:07 aaronspring

That's a good point regarding people wanting to regrid. Maybe just throw a warning. Whenever you add a dataset, check the union of coords and warn if they don't match.

bradyrx avatar Sep 09 '20 15:09 bradyrx