iris icon indicating copy to clipboard operation
iris copied to clipboard

BUG: iris pp treats standard calendar as a proleptic gregorian and ignores real proleptic gregorian calendars

Open cpelley opened this issue 6 years ago • 24 comments

https://github.com/SciTools/iris/blob/75c7570bf4d509295e239aff47b70324cd68c22d/lib/iris/fileformats/pp_save_rules.py#L398-L404

Standard calendar cannot be treated as a Proleptic Gregorian for dates 1582-01-01 More context here: ancil/ticket #1159

UM F03 more recently clarified that the calendar it expects isn't Standard but a Proleptic Gregorian. Not only is iris pp save ignoring time coordinates with proleptic_gregorian calendars entirely, it also allows saving Standard dates prior to 1582-01-01 (as though they are proleptic gregorian dates when they are not).

cpelley avatar Nov 26 '19 10:11 cpelley

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

github-actions[bot] avatar Aug 11 '21 00:08 github-actions[bot]

this is still important but I don't have any time to fix it myself

nhsavage avatar Aug 12 '21 13:08 nhsavage

If I understand this issue and the PP spec correctly, it isn't possible to save a PP file with a non-Proleptic Gregorian calendar. Possibly behaviours for Iris when asked to save a cube with a Standard calendar seems limited to

  • leave the calendar unconverted and don't warn (current behaviour)
  • leave the calendar unconverted and warn
  • throw error (bad, breaks valid existing code)
  • warn then convert to Proleptic Gregorian calendar then save (breaking change)
  • silently convert to Proleptic Gregorian calendar then save (breaking change)
  • two or more of the above behaviours, selected by a flag (which would need a default)

Are there other alternatives? Which of these would be preferred?

wjbenfold avatar Feb 03 '22 13:02 wjbenfold

taking this back to basics. The problem is that the orginal UMDP F3 which was the basis of the pp load and save rules was somewhat lose in its teminology and referrred to the UM's calendar as a Standard calendar in one place and Proleptic Gregorian somewhere else. On closer inspection, I determined that it is in fact Proleptic Greogrian. However, the pp load and save rules are treating it as Standard

This means there is a symetrical error here: https://github.com/SciTools/iris/blob/main/lib/iris/fileformats/pp.py#L1064

I thefore think there are actually two changes needed here which should cancel out in all round trip pp tests:

in pp.py change

calendar = cf_units.CALENDAR_STANDARD to calendar = cf_units.CALENDAR_PROLEPTIC_GREGORIAN

and in

pp_save_rules.py

change

elif time_coord.units.calendar == 'standard':

to

elif time_coord.units.calendar == 'proleptic_gregorian':

nhsavage avatar Feb 03 '22 14:02 nhsavage

assuming a round trip with pp data is unchanged, then there are two other cases:

  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results
  2. loading from other file formats which state that the data has Standard calendar - this is the hard one. a. we could throw an error here, which would be justified in that UM doesn't support proleptic gregorian calendars b. we could convert to proleptic gregorian. This would only affect data in Standard calendar a with a time base or a date before 1582. This is a real niche issue as almost all climate models use a Proleptic Gregorian not a Standard calendar as otherwise you need a lot of calendar code to cope with the transition from the Julian to the Gregorian calendar.

nhsavage avatar Feb 03 '22 14:02 nhsavage

What I could probably do is make the above changes (except the conversion of standard calendar) and then see which, if any, of the tests break

nhsavage avatar Feb 03 '22 14:02 nhsavage

https://github.com/SciTools/cf-units/issues/203 currently needs parallel changes in Iris, which seem quite relevant here - notably that pp files don't handle cf's "standard" calendar

wjbenfold avatar Feb 03 '22 16:02 wjbenfold

  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results

I agree that this is a bug fix. Given the likelihood that downstream users have a workaround it seems like a breaking change. We could look at hiding it behind a FUTURE flag or something (though I'm not 100% on what that entails)

  1. loading from other file formats which state that the data is Standard calendar - this is the hard one. a. we could throw an error here, which would be justified in that UM doesn't support proleptic gregorian calendars b. we could convert to proleptic gregorian. This would only affect data in Standard calendar a with a time base or a date before 1582. This is a real niche issue as almost all climate models use a Proleptic Gregorian not a Standard calendar as otherwise you need a lot of calendar code to cope with the transition from the Julian to the Gregorian calendar.

I'd be happy converting to the right calendar and then doing the save without asking the user. I think this is breaking in the same way as the other effect - it invalidates workarounds / changes results that should be reproducible.

wjbenfold avatar Feb 03 '22 16:02 wjbenfold

Sorry I’ve not read this thread in detail, but https://github.com/SciTools/cf-units/issues/185 seems relevant, and I understand @larsbarring already has a branch for it.

rcomer avatar Feb 03 '22 18:02 rcomer

  1. loading from pp and saving to some other format e.g. netCDF - this will change the calendar of the output and can be considered a bug fix which changes results

I agree that this is a bug fix. Given the likelihood that downstream users have a workaround it seems like a breaking change. We could look at hiding it behind a FUTURE flag or something (though I'm not 100% on what that entails)

I think this will break more than just saving to other formats. For example, if you plot a time series and your calendar is "standard", the values are converted to datetime.datetime and matplotlib handles them out of the box. For any other calendar, we rely on nc-time-axis, where features such as date-locators and -formatters are (probably?) not as mature. So we risk breaking a lot of plotting code. https://github.com/SciTools/iris/blob/75c7570bf4d509295e239aff47b70324cd68c22d/lib/iris/plot.py#L590-L608

For many (most?) UM applications, the data won't go back far enough for the standard/proleptic_gregorian distinction to be meaningful, so I think it would be good to permanently retain the option of cubes loaded from PP having the "standard" calendar.

rcomer avatar Feb 17 '22 09:02 rcomer

it sounds like this is an issue we will have to live with. Trying to make this backward compatible seems very hard but there is no consensus to make it a breaking change, so we will have to live with it. I propose to close this issue as "won't be fixed"

nhsavage avatar Feb 17 '22 09:02 nhsavage

As I understand it, we are only just now starting to have climate configurations with a proper calendar, whereas historically they were 360_day. So the issue could start to become a genuine problem where it wasn't before (though I'm not sure how far back even the climate experiments go).

rcomer avatar Feb 17 '22 09:02 rcomer

We make breaking changes sometimes, and with good reasons. This will be a useful place to start if in future it becomes more important to match the actual meaning of pp dates.

In the meantime, would a warning be helpful, so that users this is affecting have a pointer towards where the bug is coming from in their code that assumes Iris does this correctly?

wjbenfold avatar Feb 17 '22 10:02 wjbenfold

No one ever looks at warnings.

nhsavage avatar Feb 17 '22 13:02 nhsavage

I've just learned something...

from cf_units import Unit

u = Unit("days since 1970-01-01", calendar="standard")
gap_dates = range(-141430, -141420)
print(u.num2date(gap_dates))

shows Julian/Gregorian gap as expected

[cftime.DatetimeGregorian(1582, 10, 2, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 3, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 4, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 15, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 16, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 17, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 18, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 19, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 20, 0, 0, 0, 0, has_year_zero=False)
 cftime.DatetimeGregorian(1582, 10, 21, 0, 0, 0, 0, has_year_zero=False)]

but

print(u.num2pydate(gap_dates))

shows no gap, because the datetime library is actually Proleptic Gregorian.

datetime assumes the current Gregorian calendar extended in both directions

[real_datetime(1582, 10, 12, 0, 0) real_datetime(1582, 10, 13, 0, 0)
 real_datetime(1582, 10, 14, 0, 0) real_datetime(1582, 10, 15, 0, 0)
 real_datetime(1582, 10, 16, 0, 0) real_datetime(1582, 10, 17, 0, 0)
 real_datetime(1582, 10, 18, 0, 0) real_datetime(1582, 10, 19, 0, 0)
 real_datetime(1582, 10, 20, 0, 0) real_datetime(1582, 10, 21, 0, 0)]

which rather points to the plotting code being wrong. Possibly there are other implications but my brain hasn't got there yet...

rcomer avatar Feb 25 '22 16:02 rcomer

I wonder if we could actually have our cake and eat it here. Noting that

  • 1st January 1970 is the same date in the Proleptic Gregorian and Standard calendars
  • Any time point expressed as hours since 1st January 1970 is also therefore the same in the Proleptic Gregorian and Standard calendars

So, once we have converted our pp time headers into a coordinate with unit "hours since 1970-01-01 00:00:00", it is somewhat arbitrary whether we label it "proleptic_gregorian" or "standard". Would it therefore be reasonable to use the Proleptic Gregorian calendar to do the conversion from pp-header to coordinate, but still label the coordinate with the "standard" calendar? It would be a bit of a fudge, but would mean any change in behaviour would be limited to dates before 15th October 1582, and therefore a bugfix.

Based on https://github.com/SciTools/iris/issues/3561#issuecomment-1050997981, pp-saving could use num2pydate instead of num2date. Then the saved dates would be Proleptic Gregorian regardless of which real-world calendar was on the coordinate.

rcomer avatar Oct 25 '22 16:10 rcomer

I have to admit, I am now getting a bit lost with all of this. It came about because I needed to convert some netCDF data from the ACCESS model to ancils. That model stored the date with a time of "days since 1-1-1 0:00" or similar and when it was saved to ancil, the time was shifted. I worked around this by converting the time units to be post 1582. I am not sure if this fix would still be needed with your proposed cake fix?

nhsavage avatar Oct 27 '22 08:10 nhsavage

Hmmm. It looks like num2pydate won't even work if you have "days since 1-1-1 0:00"

import cf_units
tunit = cf_units.Unit("days since 1-1-1 0:00")
tunit.num2pydate(42)
--> 573 dates = cftime.num2date(time_values, **num2date_kwargs)
    574 try:
    575     # We can assume all or none of the dates have a microsecond attribute
    576     microseconds = np.array([d.microsecond if d else 0 for d in dates])

File src/cftime/_cftime.pyx:504, in cftime._cftime.num2date()

ValueError: illegal calendar or reference date for python datetime

But since cf_units v3.1 we can do

new_unit = tunit.change_calendar("proleptic_gregorian")
print(new_unit.num2date(0))
0000-12-30 00:00:00

So maybe pp-saving should enforce a Proleptic Gregorian calendar before doing its conversions to headers.

I need more caffeine...

rcomer avatar Oct 27 '22 09:10 rcomer

Noting that the calendar previously called "gregorian" in the cf-conventions is now called "standard", I've updated the comments through this issue to hopefully reduce confusion.

rcomer avatar Nov 02 '22 11:11 rcomer

This no longer needs a team discussion of the principles, it needs a concrete proposal for people to discuss. So something to assign as a project for someone in future. If we can't find someone to assign, I will close this (unless it gets enough votes).

trexfeathers avatar Jan 18 '23 10:01 trexfeathers

proposal

in pp.py change

calendar = cf_units.CALENDAR_STANDARD to calendar = cf_units.CALENDAR_PROLEPTIC_GREGORIAN

and in

pp_save_rules.py

change

elif time_coord.units.calendar == 'standard':

to

elif time_coord.units.calendar == 'proleptic_gregorian':

then we have to decide how to handle files in a standard a calendar

option 1: error option 2: convert calendar to proleptic greogrian before saving. Will only affect results for dates before 1582

nhsavage avatar Jan 18 '23 14:01 nhsavage

option 2: convert calendar to proleptic greogrian before saving. Will only affect results for dates before 1582 I think this is the best solution here.

nhsavage avatar Jan 18 '23 14:01 nhsavage

I have stuck up a PR based on my comments above at #5138

rcomer avatar Jan 18 '23 16:01 rcomer

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

github-actions[bot] avatar Jun 14 '24 00:06 github-actions[bot]

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

github-actions[bot] avatar Oct 28 '25 00:10 github-actions[bot]

for now it would be worth this staying open as #5138 is close to being ready

nhsavage avatar Oct 28 '25 08:10 nhsavage