icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Fix checksum validation in datetime time zone names

Open sffc opened this issue 9 months ago • 5 comments
trafficstars

Currently we check the checksum only on the first load of a time zone name payload. However, we should retain the checksum in case we need to do further loads, at least the one in mz_periods.

This bug can probably be triggered by making a formatter with the pattern "ZZZZ VVVV" (two different time zone fields that both require loading metazone periods).

Also, consider optimizing BlobSchema::get_checksum to use the probe function. We shouldn't need to binary-search the ZeroTrie.

https://github.com/unicode-org/icu4x/pull/6046/#discussion_r1939671585

sffc avatar Feb 03 '25 16:02 sffc

I'm hitting this error in ZonedDateTimeFormatter FFI, #6233

It gets triggered because I first load the names into the DateTimeNames with the helper functions and then call try_into_formatter which calls load_for_pattern which tries loading the same names again.

sffc avatar Mar 07 '25 08:03 sffc

I worked around the case hit in #6233 by relaxing the check for is_none(). If all three loads return None, then we can assume that all of the payloads were cached.

The bug in the OP is still reachable, I think.

sffc avatar Mar 07 '25 08:03 sffc

I'm not sure about the best way to resolve this issue and whether it impacts 2.0 stability.

sffc avatar Apr 24 '25 22:04 sffc

@Manishearth suggested a solution (I thought in a comment but maybe not): we can save the checksum in the Names but not in the Formatter, so as not to increase stack size for no good reason.

This fixes FixedCalendarDateTimeNames used in a builder pattern. I'm implementing it in #6493. I consider this to be by far the most common use case. My fix should also address patterns with multiple zones, as in "ZZZZ VVVV".

However, there's still a case when this can fail, which I added as a docs test. If you start with a Formatter and then convert it later to a Names, you don't have the checksum around any more, so we must start failing.

If someone needs to use FixedCalendarDateTimeNames in this way, here are a couple approaches:

  1. Alongside FixedCalendarDateTimeNames::from_formatter, add ::from_formatter_with_zone_checksum. Ask users to inspect the data load operation externally to get the checksum (similar to how we ask them to do this to get the resolved locale) and then feed it into that function.
  2. Alongside FixedCalendarDateTimeNames::from_formatter, add ::from_formatter_unchecked. This would set the internal saved checksum to None, allowing further load operations to continue without error, possibly in a way that produces inconsistent time zone names.
  3. Eat the stack size cost and save the checksum in the formatter. Possibly offset this cost by not having separate slots for generic and specific names at the same time.

I want to emphasize that these checksums are here for good reason. You cannot use metazone period data from CLDR 47 with time zone names data from CLDR 48. It will cause the wrong time zone to be displayed because the indices are wrong.

sffc avatar Apr 25 '25 01:04 sffc

Oops, never posted this comment, and now it's less relevant, but posting for posterity:

It seems like FixedCalendarDateTimeNames and DateTimeNames are in spirit builder APIs, that eventually build datetime formatters. It is fine for them to transiently track checksums for loaded timezone data, builder APIs often track such state.

These builder types should not be what is directly contained in DateTimeFormatter (which is not currently the case, DateTimeFormatter contains RawDateTimeNames, which is the shared internal type). There was a brief period where we were considering having the same type there and we decided to encapsulate things this way, and I'm glad we did, since it makes this decision straightforward.

Manishearth avatar Apr 25 '25 04:04 Manishearth