iris
iris copied to clipboard
Should `guess_bounds` "do what I mean" for Gregorian monthly data?
✨ Feature Request
Iris coordinate guess_bounds functionality identifies the bounds as halfway between the points. For a time coordinate, when the points are mid-month with a Gregorian calendar, should iris instead set the bounds to start and end of the month?
Motivation
In ANTS, we have this functionality already, but do not support any other time cases. Iris' guess_bounds is more flexible in time handling, so we'd love to retire our limited time handling in favour of using the iris behaviour. In other words, iris handles the general case well, but does not handle this specific case as a user might expect; while ANTS handles this particular case well, but does not handle any other cases for time coordinates. Ideally, iris guess_bounds would give us the best of both worlds.
We can delegate to iris guess_bounds to get the best of both worlds in ANTS. I think the optimal solution though is for this behaviour to be available for all iris users.
ANTS docs are here, for reference: https://code.metoffice.gov.uk/doc/ancil/ants/latest/lib/ants.utils.html#ants.utils.coord.guess_bounds (and link through to the implementation source code - there's also a unit test for the Gregorian case here:https://code.metoffice.gov.uk/trac/ancil/browser/ants/trunk/lib/ants/tests/utils/coord/test_guess_bounds.py?marks=100-111#L100 )
This comes up fairly frequently when working with model data.
Additional context
Example of current behaviour with iris 3.2:
In [1]: import iris
In [2]: import iris.coords
In [3]: time = iris.coords.DimCoord(points=[210756., 211464., 212172., 212904., 213636., 214368., 215100
...: .,215844., 216576., 217308., 218040., 218772.], units='hours since epoch', standard_name='time')
...:
In [4]: time.units.num2date(time.points) # Points are mid-month for gregorian calendar
Out[4]:
array([cftime.DatetimeGregorian(1994, 1, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 2, 15, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 3, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 4, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 5, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 6, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 7, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 8, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 9, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 10, 16, 12, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 11, 16, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 12, 16, 12, 0, 0, 0, has_year_zero=False)],
dtype=object)
In [5]: time.guess_bounds()
In [6]: time.units.num2date(time.bounds) # Bounds are not start/end of month
Out[6]:
array([[cftime.DatetimeGregorian(1994, 1, 1, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 1, 31, 6, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 1, 31, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 3, 1, 18, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 3, 1, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 3, 31, 18, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 3, 31, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 5, 1, 6, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 5, 1, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 5, 31, 18, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 5, 31, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 7, 1, 6, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 7, 1, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 8, 1, 0, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 8, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 8, 31, 18, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 8, 31, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 10, 1, 6, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 10, 1, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 10, 31, 18, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 10, 31, 18, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 12, 1, 6, 0, 0, 0, has_year_zero=False)],
[cftime.DatetimeGregorian(1994, 12, 1, 6, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1994, 12, 31, 18, 0, 0, 0, has_year_zero=False)]],
dtype=object)
and similar for ANTS 0.19:
In [1]: import iris
In [2]: import ants
In [3]: time = iris.coords.DimCoord(points=[210756., 211464., 212172., 212904., 213636., 214368., 215100
...: .,215844., 216576., 217308., 218040., 218772.], units='hours since epoch', standard_name='time')
...:
In [4]: time.units.num2date(time.points) # Points are mid-month for gregorian calendar
Out[4]:
array([real_datetime(1994, 1, 16, 12, 0),
real_datetime(1994, 2, 15, 0, 0),
real_datetime(1994, 3, 16, 12, 0),
real_datetime(1994, 4, 16, 0, 0),
real_datetime(1994, 5, 16, 12, 0),
real_datetime(1994, 6, 16, 0, 0),
real_datetime(1994, 7, 16, 12, 0),
real_datetime(1994, 8, 16, 12, 0),
real_datetime(1994, 9, 16, 0, 0),
real_datetime(1994, 10, 16, 12, 0),
real_datetime(1994, 11, 16, 0, 0),
real_datetime(1994, 12, 16, 12, 0)], dtype=object)
In [5]: ants.utils.coord.guess_bounds(time)
In [6]: time.units.num2date(time.bounds) # Bounds are now start/end of month
Out[6]:
array([[real_datetime(1994, 1, 1, 0, 0), real_datetime(1994, 2, 1, 0, 0)],
[real_datetime(1994, 2, 1, 0, 0), real_datetime(1994, 3, 1, 0, 0)],
[real_datetime(1994, 3, 1, 0, 0), real_datetime(1994, 4, 1, 0, 0)],
[real_datetime(1994, 4, 1, 0, 0), real_datetime(1994, 5, 1, 0, 0)],
[real_datetime(1994, 5, 1, 0, 0), real_datetime(1994, 6, 1, 0, 0)],
[real_datetime(1994, 6, 1, 0, 0), real_datetime(1994, 7, 1, 0, 0)],
[real_datetime(1994, 7, 1, 0, 0), real_datetime(1994, 8, 1, 0, 0)],
[real_datetime(1994, 8, 1, 0, 0), real_datetime(1994, 9, 1, 0, 0)],
[real_datetime(1994, 9, 1, 0, 0),
real_datetime(1994, 10, 1, 0, 0)],
[real_datetime(1994, 10, 1, 0, 0),
real_datetime(1994, 11, 1, 0, 0)],
[real_datetime(1994, 11, 1, 0, 0),
real_datetime(1994, 12, 1, 0, 0)],
[real_datetime(1994, 12, 1, 0, 0),
real_datetime(1995, 1, 1, 0, 0)]], dtype=object)
In line with the design decision in #4723, it is more likely that a user argument should be provided, rather than special behaviour for specific cases. We're trying to shy away from Iris 'magically' guessing what the user might want.
If others agree with this, then I guess there's less debate about whether it should be implemented - it would be opt-in behaviour.
it is more likely that a user argument should be provided...
I think that makes a great deal of sense to me. The current behaviour has iris doing exactly what the user is telling it to do, and is consistent with non-time coordinates. Having a flag to say "align_with_months" (or similar) to get more lenient behaviour that is aware of the pattern of the points being mid-months feels like a way to handle the irregularity of the Gregorian calendar in a user friendly manner.
@hdyson has said their team will put up a PR in due course 👍
@hdyson this is currently assigned to you, but since we did this your team has gone through some changes. Do you still want this?
@hdyson I've unassigned you from this issue, are you still keen to see this issue addressed?
Just wanted to confirm that this isn't a WIP that we don't know about ... otherwise, we'll consider it for future work.
Thanks
@trexfeathers, @bjlittle Thanks - you're both spot on. It is functionality we would like, but it's not something that's being actively worked on by us.