icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Fix bugs in several calendars with new continuity test

Open sffc opened this issue 1 year ago • 1 comments

Fixes #2703 Fixes #4914 Mostly fixes #2713

I wrote a test that checks for invariants on calendar behavior. In addition to ISO round-trip, it tests simple day-based calendar arithmetic, which is supported in all calendars, and it exercises a lot of code paths that aren't otherwise covered, such as month length, year length, and leap year status, which unveiled bugs in multiple calendars (Observational Islamic, Saudi Islamic, Hebrew, Chinese, Coptic, and Ethiopian).

There are several loosely related changes in this PR so I made an effort for it reviewable commit-by-commit.

sffc avatar May 15 '24 04:05 sffc

Tests are failing because I need to regen Chinese cached calendar data (does this break semver or is this new in 1.5?), but I want to wait until we discuss the above question.

sffc avatar May 15 '24 05:05 sffc

I disagree with the commit changing the bounds of the Chinese calendar. Firstly, the calendar is not that old (even the thing about January 20 in 30 AD is dubious, but that at least works with the math).

More importantly, the way the Chinese new year is defined ("such that the winter solstice falls in the eleventh month") mathematically gives you a narrow range of ~30 days in which it may occur. I don't have the math at hand but when implementing this optimization I did the math and.

It also matches Aslaksen's The Mathematics of the Chinese Calendar.

This is a researched and mathematically accurate set of bounds and I do not think we should get rid of it because floating point error is breaking tests for date ranges that are irrelevant for the calendar.

Manishearth avatar May 21 '24 16:05 Manishearth

It's no longer an optimization since we have unused space there but I'd still prefer to keep it around, if the computer is producing dates outside of that range the computer is probably wrong.

Manishearth avatar May 21 '24 16:05 Manishearth

Please see my comment above regarding the Chinese calendar drift.

sffc avatar May 21 '24 16:05 sffc

Ah, I was reviewing this on a phone and couldn't see all the comments. I am still leaning towards considering those as cases we shouldn't try to fix. I'm somewhat open to bumping up the new year offset range to allow for more dates, it doesn't cost us anything. We should still document this well and have assertions that between 31 AD and maybe 2000 years from now, the new year falls in the narrower range specified, citing Reingold and Aslasken.

Manishearth avatar May 21 '24 18:05 Manishearth

According to Wolfram Alpha (query):

Winter Solstice Time Winter Solstice Date
2:41 pm LMT Saturday, November 24, 20000 BC (extrapolated Gregorian calendar)
11:27 am LMT Saturday, November 28, 19000 BC (extrapolated Gregorian calendar)
4:45 am LMT Saturday, December 1, 18000 BC (extrapolated Gregorian calendar)
4:27 pm LMT Friday, December 4, 17000 BC (extrapolated Gregorian calendar)
8:57 pm LMT Thursday, December 6, 16000 BC (extrapolated Gregorian calendar)
5:09 pm LMT Wednesday, December 9, 15000 BC (extrapolated Gregorian calendar)
4:16 am LMT Tuesday, December 11, 14000 BC (extrapolated Gregorian calendar)
6:30 am LMT Sunday, December 13, 13000 BC (extrapolated Gregorian calendar)
12:16 am LMT Friday, December 14, 12000 BC (extrapolated Gregorian calendar)
10:50 am LMT Tuesday, December 15, 11000 BC (extrapolated Gregorian calendar)
3:31 pm LMT Saturday, December 15, 10000 BC (extrapolated Gregorian calendar)
4:09 pm LMT Wednesday, December 16, 9000 BC (extrapolated Gregorian calendar)
2:22 pm LMT Sunday, December 16, 8000 BC (extrapolated Gregorian calendar)
11:41 am LMT Thursday, December 17, 7000 BC (extrapolated Gregorian calendar)
9:18 am LMT Monday, December 17, 6000 BC (extrapolated Gregorian calendar)
7:58 am LMT Friday, December 18, 5000 BC (extrapolated Gregorian calendar)
8:15 am LMT Tuesday, December 18, 4000 BC (extrapolated Gregorian calendar)
9:47 am LMT Saturday, December 19, 3000 BC (extrapolated Gregorian calendar)
12:16 pm LMT Wednesday, December 19, 2000 BC (extrapolated Gregorian calendar)
2:50 pm LMT Sunday, December 20, 1000 BC (extrapolated Gregorian calendar)
4:21 pm LMT Thursday, December 20, 1 AD (extrapolated Gregorian calendar)
9:54 am LMT Sunday, December 21, 1000 AD (extrapolated Gregorian calendar)
5:21 am PST Thursday, December 21, 2000
8:18 pm PST Sunday, December 21, 3000
5:21 am PST Thursday, December 21, 4000
7:44 am PST Sunday, December 21, 5000
3:02 am PST Wednesday, December 20, 6000
2:59 pm PST Friday, December 19, 7000
7:57 pm PST Sunday, December 17, 8000
6:36 pm PST Tuesday, December 16, 9000
12:01 pm PST Thursday, December 14, 10000
1:28 am PST Saturday, December 13, 11000
12:14 pm PST Sunday, December 10, 12000
9:42 pm PST Monday, December 8, 13000
6:41 am PST Wednesday, December 6, 14000
3:53 pm PST Thursday, December 4, 15000
1:10 am PST Saturday, December 2, 16000
10:11 am PST Sunday, November 30, 17000
5:55 pm PST Monday, November 27, 18000
11:13 pm PST Tuesday, November 25, 19000
12:47 am PST Thursday, November 23, 20000

sffc avatar May 21 '24 18:05 sffc

The problem is that everything currently ends up calculating a PackedChineseBasedYearInfo which has this requirement about the allowed start dates in Jaunary/February, but it's trivial to break that requirement, as shown above.

Should we use a different non-packed intermediate type without so many constraints?

sffc avatar May 21 '24 18:05 sffc

The range of dates Temporal supports is from -100 million to 100 million days on either side of January 1 1970 (about 273000 AD).

sffc avatar May 21 '24 18:05 sffc

According to Wolfram Alpha (query):

I understand this. I don't think that affects what I said: These dates are beyond precise predictability and it is not meaningful to talk about them in this way. There is probably going to be a shift in the solstice of that form, but I don't consider those dates as important for the Chinese calendar in the first place.

Should we use a different non-packed intermediate type without so many constraints?

No, we should not introduce a separate code path. I am somewhat open to relaxing the constraints provided it doesn't have a perf impact (and has the strong assertions for the actual date ranges we care about), but we should not introduce multiple codepaths in calendar code (once a Date has been constructed) since that makes it so much easier for things to subtly break.

Manishearth avatar May 21 '24 23:05 Manishearth

According to Wolfram Alpha (query):

I understand this. I don't think that affects what I said: These dates are beyond precise predictability and it is not meaningful to talk about them in this way. There is probably going to be a shift in the solstice of that form, but I don't consider those dates as important for the Chinese calendar in the first place.

I do care about making things work for all dates in the range supported by Temporal. They don't have to be perfectly accurate, but the data model should support them.

Even if I increase the date offset to 64 days, we only get to about 20,000 AD/BC which is still only about 10% of the range required by Temporal.

Should we use a different non-packed intermediate type without so many constraints?

No, we should not introduce a separate code path. I am somewhat open to relaxing the constraints provided it doesn't have a perf impact (and has the strong assertions for the actual date ranges we care about), but we should not introduce multiple codepaths in calendar code (once a Date has been constructed) since that makes it so much easier for things to subtly break.

I don't want to introduce another code path, but currently we use PackedChineseBasedYearInfo in the DateInner's ChineseBasedYearInfo. I was proposing that we could instead use the packed year info in the data provider but then expand its fields into ChineseBasedYearInfo, increasing its stack size a little bit but removing the constraint on the new years offset, and for calculated data we go directly into the unpacked struct.

sffc avatar May 22 '24 00:05 sffc

Brainstorming a few options:

  1. Keep the invariants as-is, debug-panic when violated, and fall back to January 19 as the new year, which may or may not be a new moon. (current behavior on main)
  2. Keep the invariants as-is, debug-panic when violated, and hack the Winter Solstice to always land on December 20/21, so that the new year contributes to be on a new moon.
  3. Use an unpacked date inner type without the constraints, and use whatever garbage we get from the solstice function to calculate the new year. Note that it's possible to end up with a negative offset from January 1 when the solstice starts drifting into November or earlier.
  4. Fall back to Proleptic Gregorian dates and eras when outside of a max supported range that we define for the Chinese calendar, such as 1000 BC to 5000 AD (±3000 years). -- we already do this in Japanese and could do it in other calendars as well.
  5. Allow offsets up to 64 days within PackedChineseBasedYearInfo with January 1 as the reference, debug-panic if out of range, and fall back to January 1 as the new year. (what is currently in this PR)

sffc avatar May 22 '24 00:05 sffc

Right, I understand your proposal, I don't want to increase the stack size of the Date type here.

I think the status quo with an explicit clamp for the solstice and no panic when the years are out of "bounds" is my ideal choice, and I don't think it breaks the data model. However I think expanding to Jan 1 is acceptable too.

In general I am fine with restricting our assertions to year ranges we consider relevant.

Manishearth avatar May 22 '24 01:05 Manishearth

Okay, as demonstrated in #4929, increasing the size of the offset above 32 bits is unavoidable.

Given the increased bits, I've now implemented a solution that pins the winter solstice to December 20-23 and allows the new year to land as early as January 19, which is the earliest the second New Moon could occur after December 20. It's not actually possible for a real-life Chinese New Year to land on January 19, because if the Winter Solstice were December 20 and there was a 29-day month starting on December 21, either it or the month after it would be a leap month. However, since we pin the winter solstice, it is possible that there are no winter leap months, and therefore January 19 ends up being a possible output of the algorithm.

This solution is still better than the original solution in this PR because it works for a longer range of years and should theoretically be able to work for an arbitrarily long range.

sffc avatar May 22 '24 19:05 sffc

I already pulled out some of the changes, but if desired, I can split this PR up further.

sffc avatar May 22 '24 19:05 sffc

Branch with individual commits archived at https://github.com/sffc/omnicu/tree/archive/negative-tests

sffc avatar May 24 '24 07:05 sffc