Explicit check for matching coords
This has been an issue on my mind for awhile. But we need to have something in checks.py that acts as a decorator to check that the input Datasets/DataArrays being compared have the same coordinate labeling (outside of time).
For instance,
import climpred
hind = climpred.tutorial.load_dataset('CESM-DP-SST-3D')
verif = climpred.tutorial.load_dataset('FOSI-SST-3D')
# Add nlat/nlon coordinates only on one.
h['nlat'] = np.arange(h.nlat.size)
h['nlon'] = np.arange(h.nlon.size)
climpred.prediction.compute_hindcast(h, v, max_dof=True)
This returns
ValueError: indexes along dimension 'nlat' are not equal
I've encountered this a lot when something I'm comparing to drops coords along the way or doesn't have them to begin with. And it makes you think something in climpred is broken during alignment, like we're losing spatial cells. In reality it's just that one has a coordinate for nlat and one doesn't.
Isn’t the current return message from xr self-explanatory for the problem?
I wouldn't say so. At first glance I would think it means that 'nlat' between forecast and verif are not of equal length. It's sort of vague... it would make way more sense if it said something like
ValueError: Coordinates do not match along dimension 'nlat'
this is true: ValueError: Coordinates do not match along dimension 'nlat'
but the xr check is also true. what I want to emphazise is that we dont need to write error messages for all possible errors: i would focus on those where the problem is hidden and not mentioned directly by the error message.
here climpred did nothing to the dimension nlat. if the problem would arise on lead, time, or init we should definately explain this. but in a way, nlat is just an additional dimension which climpred didnt touch. maybe a more explicit error message could be forecast and reference differ in coordinate 'nat'
if we find matching coords, we could show those as coords of all in repr and leave them out in the individual dataset reprs
Agreed. Maybe upon instantiating the object we also just add coords for all dims that exist on other objects so we don't have to worry about that error.
maybe we can just throw a warning in the case of coords (which are also dimensions) mismatch. singular coords like member_id='r1...' would be safe to ignore
Or we don’t do this coords checking (only do it for time to check calendar) because then we could use xesmf regridding to regrid different datasets to 1x1 or 5x5 der grids which then should match
That's a good point regarding people wanting to regrid. Maybe just throw a warning. Whenever you add a dataset, check the union of coords and warn if they don't match.