icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

Remove `Hijri<AstronomicalSimulation>` logic

Open robertbastian opened this issue 1 month ago • 7 comments

This reduces the code size of date_try_from_fields.wasm from 47kB to 19kB.

We still include 250 years of hardcoded data, and fall back to the tabular algorithm outside that range.

This calendar specifically documents that we can change the range of the astronomical calculations (which are incorrect anyway).

This is not the size increase observed in https://github.com/unicode-org/icu4x/issues/7201, as Temporal excludes this calendar.

robertbastian avatar Dec 10 '25 14:12 robertbastian

I think the docs say everything we need them to say.

These simulations are unofficial and are known to not necessarily match sightings on the ground.

As floating point arithmetic degenerates for far-away dates, this falls back to the tabular calendar at some point.

The precise behavior of this calendar may change in the future if:

  • We decide to tweak the precise astronomical simulation used
  • We decide to expand or reduce the range where we are using the astronomical simulation.

robertbastian avatar Dec 10 '25 16:12 robertbastian

Ah, we still include the hardcoded data.

Manishearth avatar Dec 10 '25 16:12 Manishearth

I thought our agreement was to retain Hijri Astronomical Simulation as a non-AnyCalendar.

It already is an AnyCalendar

Manishearth avatar Dec 10 '25 19:12 Manishearth

I'm very confused then. I feel like we keep asking this question every month policy for which calendars go where. Is it written down anywhere?

sffc avatar Dec 10 '25 19:12 sffc

I'm confused: How is the policy about which calendars go where relevant? This calendar is already here, because when we first did AnyCalendar we followed Temporal/ICU4C at the time, but Temporal changed its mind after further investigation. I consider this calendar variant legacy: I didn't in the past, but I do now.

The solution you propose (retain HijriAstronomical as a non-AnyCalendar) is not possible because it is currently already AnyCalendar, and that has nothing to do with whatever policy we may settle on around this.

The closest analogue to that solution is to simplify the variant in AnyCalendar (as is being done in this PR), and potentially expose a different type as new_mecca_simulated_full() or something.

We could "simplify" by completely just replacing it with tabular internally (and deprecating).

The nice thing is that because Hijri is now generic it's easy enough for clients to put this together.

Manishearth avatar Dec 10 '25 21:12 Manishearth

A solution that is more consistent with how I see the role of Hijri Astronomical Simulation would be to take the AnyCalendar variant, mark it deprecated, make it use the Tabular behavior, and then keep the Hijri Astronomical Simulation around in an always calculating mode.

sffc avatar Dec 10 '25 23:12 sffc

that would require new types. I don't think restricting this calendar to the 250 years where we have hardcoded data is going to be an issue for anyone.

robertbastian avatar Dec 10 '25 23:12 robertbastian

  • @hsivonen Should not have shipped at all. Given that, and given that it is a Reingold sim with mecca as a ref point, and we don't have evidence of anyone specifically needing Reingold-with-Mecca, and we already have KACST-mecca-based in UAQ, could we resolve this by marking this as deprecated and remove all of its internals, and makes its internals delegate to UAQ. And then we would still be offering a mecca-based sim, not Reingold-based but the authoritative one. Which would resolve codesize issues. And we wouldn't be hurting anyone since we don't have users for it.
  • @manishearth used to think this was intentional, now I think
  • @sffc This PR isn't about Hijri, it's about try_from_fields codesize. It's about making TFF load less calendar data. We shouldn't be discussing whether we want to ship simulation or not, we already decided on not shipping simulation. I'm surprised we still are. Also about JapaneseExtended?
  • @sffc I'm a bit confused about the decision here. I'm always surprised when I see that icu_calendar isn't matching that decision
  • @hsivonen Do you have a practical problem with what I suggested?
  • @sffc I want to decouple these problems.
  • @manishearth so the history here is that in 0.5 we knew japanext was a mess and we made changes to formatting and such to move away from it. At the time we weren't too worried about AnyCalendar codesize. Later we started caring more, and we should have removed it for 2.0 but it slipped. rgsa got clarified post-2.0.
  • @sffc I'm not sure we ever had an agreement about the scope of AnyCalendar. I recall a discussion pre-2.0 where AnyCalendar would have all the calendars, not just the "good" calendars, which is when we added an internal FormattableAnyCalendar enum in icu_datetime. I also recall a discussion where someone, maybe @robertbastian, suggested having a different enum GoodCalendar, and keeping AnyCalendar as encapsulating everything. Even Julian.
  • @hsivonen given that we are in the situation we have what can we do to move forward. For a library positioned as expert-authored, exporting stuff that is considered bad isn't great.
  • @manishearth https://github.com/unicode-org/icu4x/issues/4889#issuecomment-2122903708
  • @sffc So I think we want an enum that's GoodCalendar, and one that's AllCalendar (which one is AnyCalendar is a open question)
  • @sffc I want the icu_calendar crate to be something where people can add calendar impls, where people can add less-important calendars. People can add new variants of Hijri too. They can go into AllCalendar, not GoodCalendar.
  • @sffc I don't want to change HijriAstronomical because I want that to be in the bucket of things that are not proven but are still interensting
  • @hsivonen From my perspective the ideal end state is we wouldn't have Japanext or HijriSimulated in ICU4X. I think they shouldn't be there but we have semver, and what's the "most nonexistent" we can do in this semver constraints.
  • @hsivonen In a library that comes from the Unicode consortium with i18n expertise, it's actively harmful to have curiosities that are a result of programmers pursuing fascinating things as opposed to end user use cases. Especially in the i18n context it's really hard to understand if a feature that is for some culture that is not the orogrammer's experience, hard to evaluate "is this something that my product's users are really going to need", or is it YAGNI. Having curiosities has this hazard htat has happened with Intl DTF, and unqualified islamic, and islamic-rgsa, and the Swift enum. Stuff exists in a library that is perceived to embody expertise that is taken as "this is the real stuff". If something has attested use cases, there's a case for having it as Tier 2 in icu_calendar, but even Tier 2 should be motivated by attested use cases.
  • @Manishearth So I come from a similar perspective as Henri which is why I think AnyCalendar should be GoodCalendar: GoodCalendar has strong use cases, active users, and AnyCalendar is the current thing that exists and is used. I would like users to come to us with their needs for AllCalendar and we can add it then
  • @sffc Given that the current enum AnyCalendar already contains everything except Julian, a good path would be to add Julian and add a new GoodCalendar instead of moving AnyCalendar to GoodCalendar.
  • @nekevss There was a discussion at some point about AnyCalendar being "web compatible calendar". Pushback against calendars being dictated by ECMA402 from Robert. We could just have ECMACalendar.

Manishearth avatar Dec 11 '25 20:12 Manishearth

From my perspective the ideal end state is we wouldn't have Japanext or HijriSimulated in ICU4X. I think they shouldn't be there but we have semver, and what's the "most nonexistent" we can do in this semver constraints.

The above already captured my suggestion to make (already-deprecated) HijriSimulated delegate to HijriUmmAlQura, which is semantically keeping Mecca-based astronomical simulation but providing the KACST simulation in place of a Reingold simulation, but I also suggested deprecating JapaneseExtended and making making its approximation list empty. We don't currently guarantee a specific number of pre-Meiji eras. As a practical code size matter, this could be implemented by delegating to Japanese.

hsivonen avatar Dec 12 '25 07:12 hsivonen

AnyCalendar is the type that we have advertised, and that clients are using. This pointless toy calendar that makes up 50% of code size should be removed.

robertbastian avatar Dec 15 '25 23:12 robertbastian

  • Robert: My change for AstronomicalSimulation falls within our API guarantees easily since it keeps the calendar for its useful parts.
  • Manish: would prefer to just get rid of it
  • Robert: Shane disprefers that
  • Shane: I think the astronomical calendar is useful for clients. I also think. Currently documentation says that it falls back to the tabular calendar. Not sure why the docs are less good
  • Robert: we didn't spend time on documenting this before.
  • Robert: I want to reduce its validity range to the hardcoded range
  • Shane: Where do we find that range?
  • Robert: simulated_mecca_data, 1899-2140.
  • Manish: future data isn't real
  • Shane: went over this yesterday; (comment in RGSA thread)
  • Shane: would like to not call this trash data
  • Manish: I'm convinced that when we have to use cases for this. Do people actually want a calendar like this?
  • Robert: Two discussions. The RGSA tag, which we've assigned here, and then this calendar type in Rust.
  • Robert: Happy with keeping this calendar around but not identifying it with rgsa, because it's not
  • Manish: Could fix the past data and then it's fine to call it rgsa with the understanding that future dates are predictions
  • Shane: I don't think the past data is wrong
  • Manish: Not sure what people want this to be
  • Shane: I have a clear idea.
  • Robert: if someone wants to use this do they need past data from 1899
  • Shane: probably not.
  • Robert: not sure why this wasn't mergeable before then
  • Shane: fact that I was missing was that the docs say that it was tabular.
  • Manish: I think this is a bandaid but there's enough disagreement that I'm happy landing the bandaid.

General agreement to landing the PR #7301.

Manishearth avatar Dec 16 '25 16:12 Manishearth

image

robertbastian avatar Dec 16 '25 16:12 robertbastian

It's regrettable that this landed with the title "Remove Hijri<AstronomicalSimulation> logic" which is not what this PR does, and is part of what negatively influenced my impression of this pull request.

sffc avatar Dec 16 '25 17:12 sffc

No, your changed title is not correct. Using pre-computed Reingold for 250 years is not a change

robertbastian avatar Dec 17 '25 14:12 robertbastian