ecma402
ecma402 copied to clipboard
Normative: Added note about sets of locales for web browser implementations needing to not change as a result of user behaviour
fix #588
Why would this restriction only apply to web browsers?
This PR is in response to feedback from the 2021-05-25 TC39 meeting and is meant to address concerns about potential fingerprinting issues that only pertain to browser implementations.
Is there any reason not to apply the same restrictions to all engines tho? The ideal is that everything applies to everyone equally; having something only apply to a subset of impls is a suboptimal outcome.
updated to apply restriction to all hosts
(please don't land this until 402 editors have reviewed)
The W3C I18N Working Group discussed this PR today in our teleconference. I am adding this comment on our behalf.
We are concerned that this prohibition will disadvantage smaller language/cultural communities who might rely on installation of support to enable locale-based APIs in the browser or JS host.
We feel that precluding the ability to install a locale or the parts of a locale (such as dictionaries for spell check/breaking/etc.) that assist with high-quality presentation on the Web and in JS applications has the potential to negatively impact those communities that cannot depend on support from browser or system vendors. If there is a "fingerprinting" risk associated with such installation, providing a warning to the user might be the best appropriate response.
Also, note that currently we are not aware of runtimes that allows the list of locales to be updated (other than by updating the entire underlying ICU build), so this strikes us as preventing a feature from existing that might be useful. Also, we note that CLDR releases include new locales twice each year, so presumably browsers would change their list of available locales as updates propagate.
CC @codehag for feedback on @aphillips' comment above (see issue #588 for a reminder of the problem this PR is trying to solve)
@aphillips doesnt this prohibition only require refreshing the page after installing a new locale?
@ljharb asked:
doesnt this prohibition only require refreshing the page after installing a new locale?
Not as far as I can tell? It seems to require that any two distinct users on the same version on the same platform (not the "same machine") should always get the same enumerable set of available values. The change in cb6449b extends this to include anything (numbering systems, calendars, etc.). I note that one frequent cause of such patching would be time zone data (which is not listed but is the regular source of runtime patching outside the normal release cycle).
If I understand the threat here, it's to prevent a bad actor from installing a locale into a user's browser and then using that locale ID (perhaps using a well-formed locale ID like: bad-actr-06b3c) to track the specific user (or to differentiate groups of users). That suggests that there should be no silent installation of locales or locale data, not necessarily an outright prohibition?
There is a similar issue (w3c/css-drafts#4055) related to fingerprinting based on fonts (which have a much higher level of per-installation variability and which, unlike locales, can actually be installed currently 😃). The challenge here is to support under-served communities of users--particularly when the threat is more-or-less theoretical--without exposing large groups of Web users to abusive behavior.
I note that one frequent cause of such patching would be time zone data (which is not listed but is the regular source of runtime patching outside the normal release cycle).
Which hosts and engines currently do runtime patching of time zone data? Are these OS patches or browser patches?
@ben-allen I would like to ensure that this is including the possibility of vastly different behavior of the same browser. For example, iPhone has lockdown-mode (LDM, https://support.apple.com/en-us/HT212650) which explicitly disables some of the features to put some extra defense against the targeted attacks. I don't think Intl related things can be changed based on these modes, but I would like to ensure that there is this kind of explicit possibility and this possibility is allowed in the statement :)
Eemeli pointed in the TG2 meeting that this can be said as a part of platform difference, and then this sounds fine. So I would like to ensure that the above possibility is counted as a part of platform difference.
Other part pretty looks good to me! Thanks for your work.
@Constellation can lockdown mode come into effect without reloading the page? If it forces a reload, then this note wouldn't apply at all, since it wouldn't be observable within the lifetime of a program.
TG2 notes: https://github.com/tc39/ecma402/blob/master/meetings/notes-2023-06-01.md#editorial-added-note-about-sets-of-locales-for-web-browser-implementations-needing-to-not-change-as-a-result-of-user-behaviour-780
Seeking feedback from @dminor
Our concern is that on-demand installation of locale data could provide an easy fingerprinting vector for members of smaller language/cultural communities who may face discrimination or persecution, for example by dominant cultural groups or by the government of the country in which they live.
Our position is that it is better to ship data for all locales in a single bundle, which ensures that data for smaller communities is available, without exposing them to a fingerprinting risk.
I believe it would be very difficult to create a user warning that would explain the potential risk in a way that would allow a user to make an informed decision about accepting extra locale data. I suspect most people would ignore these warnings. I'd also point out that there's no guarantee that for a small linguistic community the text of the warning would be localized, which would decrease the likelihood of making an informed decision.
The key point is that the set of locales should not change as a result of user behaviour. We're not trying to prevent vendors from shipping a new bundle of locale data to users as part of an update, just data for individual locales. In the case of Firefox, CLDR and timezone updates are done as part of our normal release cycle anyway.
The W3C I18N Working Group discussed this PR today in our teleconference. I am adding this comment on our behalf.
We are concerned that this prohibition will disadvantage smaller language/cultural communities who might rely on installation of support to enable locale-based APIs in the browser or JS host.
This type of feature was recently requested to me, in the context of ICU4C, and explicitly for the purpose of supporting minority, disadvantaged languages. (Which, yes, could potentially be at-risk for fingerprinting of various kinds.)
We feel that precluding the ability to install a locale or the parts of a locale (such as dictionaries for spell check/breaking/etc.) that assist with high-quality presentation on the Web and in JS applications has the potential to negatively impact those communities that cannot depend on support from browser or system vendors. If there is a "fingerprinting" risk associated with such installation, providing a warning to the user might be the best appropriate response.
Agreed. Default breaking for Thai script, for example, in current implementations causes problems for minority script users. more here
Also, note that currently we are not aware of runtimes that allows the list of locales to be updated (other than by updating the entire underlying ICU build), so this strikes us as preventing a feature from existing that might be useful. Also, we note that CLDR releases include new locales twice each year, so presumably browsers would change their list of available locales as updates propagate.
CLDR releases include certain locales as "basic" and above, but there are other locales not included.
ICU4C default build includes certain locales, but not others.
Vendors include certain locales, but not others.
In short, certain locales are already excluded from web implementations. I'm concerned that requiring that these locales cannot be added on the fly could end up negatively impacting users of already-digitally disadvantaged languages.
Is there any reason not to apply the same restrictions to all engines tho? The ideal is that everything applies to everyone equally; having something only apply to a subset of impls is a suboptimal outcome.
On another topic, Node.js has, from the first versions that included Intl by default, had the ability to customize at build and runtime the set of locales available, and also to supplement the locales depending on the startup environment. It's also been requested to have some way to add locales at runtime there as well. This language seems to make Node.js v0.12 onwards potentially noncompliant. I don't see the argument for restriction in this type of environment at all.
@srl295 that doesn't imply to me that it can be changed during the lifetime of a program, only at program start time.
My understanding of this requirement is that once a JS program has started, it can't observe further changes to the list of available locales. To that end, anything that requires refreshing or navigating a page, or, restarting an application or launching a process, in order to observe a different set of locales seems to me that it complies with this requirement.
@srl295 that doesn't imply to me that it can be changed during the lifetime of a program, only at program start time.
OK. So "user behaviour" is scoped to the JS runtime? That's helpful… I then don't see how fingerprinting is mitigated.
My understanding of this requirement is that once a JS program has started, it can't observe further changes to the list of available locales. To that end, anything that requires refreshing or navigating a page, or, restarting an application or launching a process, in order to observe a different set of locales seems to me that it complies with this requirement.
That would be very different. And wouldn't then bring as much concern. Adding locales while running has been discussed as well, but certainly has a lot of other challenges.
With regard to @aphillips's point: as @dminor has observed, the problem goes beyond the "bad-actr" case, since it would be possible for sites to individually identify users from politically disadvantaged/persecuted groups by checking if they have locale data associated with those groups installed.
I've been in touch with @manishearth, who suggested the following as a strategy for avoiding this problem while still providing localized content for users of non-default locales:
- Have an API for loading extra locale data
- The first time any particular website uses that API with a particular non-default locale, it always behaves as if the locale needs to be downloaded. Once it's been downloaded, state is set indicating that that website knows that the locale has been downloaded. This state persists until cleared by user.
- On all other websites, the locale loading API behaves as if the locale hasn't been downloaded and then pretends to download it.
It's possible that the "pretends to download the data" stage can be observed by observing bandwidth changes, which could be a problem. And actually always re-downloading the data is a possibility, but would be not great for people on slow internet connections.
on edit: would it be reasonable to include normative language stating that browsers that allow on-the-fly locale loading must not indiscriminately reveal which locales users already have available?
Why is locale data needed on the fly without a page refresh?
Why is locale data needed on the fly without a page refresh?
If I understand your question, a possible answer is: because the page's UI might allow you to change the locale used via a popup menu?
Right, but if it requires a refresh to take effect when a new locale is needed, this requirement is satisfied.
Have an API for loading extra locale data
To add to this, I would imagine this API would be something like ensureLocaleLoaded(locale) where it returns a Promise that resolves when the locale is loaded (immediately if the locale was already loaded on this webpage in the past, otherwise in some time), and rejects if the locale could not be found or there was a problem with the internet connection (which will likely have identical results across a particular browser version)
There may also be use for an isLocaleLoaded() API that will do the same thing without loading the locale data, returning a Promise<bool>
Right, but if it requires a refresh to take effect when a new locale is needed, this requirement is satisfied.
Is this something we wish to require of users? I think that's rather onerous.
Especially if the use of this API would mostly be of the form "check if locale is loaded and then proceed" which works really well with promise chaining.
Imagine this scenario: a user logs in to a website where they have a locale preference set in their account. The website wishes to ensure locale data is loaded so that it can provide the best experience. It feels quite janky for the way to do that to be "before you do anything else, call .ensureLocaleLoaded(), and then refresh the page". It constrains the application a lot (they MUST do this at the very beginning of the pageload), and leads to a visible refresh.
I'm thinking about this a bit like XR.isSessionSupported() where you want to be able to change the experience presented to the user based on what is available, without restarting the page when they plug in a new device.
Another useful tool to provide may be a special header value that webpages can set that asks the user agent to preload a locale. They'd still need to use a promise-based API since the loading may not happen immediately
(This would be at the level of a web specification, though, since we don't do headers here. But it fits in the larger picture.)
you can refresh it for them with window.refresh(); that doesn’t seem too onerous.
Yes, my comment is not about the refresh being manual, it is that it is a pretty onerous thing to ask web developers to refresh their page halfway through; and it strongly impacts how they may design their webpage, alongside having a noticeable effect from the user's perspective.
I've pushed a major change to this proposal: instead of forbidding the installation of on-demand locales/locale components, the new language allows it if and only if no information about which non-standard locales are already installed gets disclosed. The normative text presents no preferred mechanism for doing this, though if this version of the text appears acceptable I can add a non-normative note proposing a strategy similar to @Manishearth's / WebXR's.
@dminor @srl295
It might be worth noting that one potential way to satisfy this constraint is to pretend that all on-the-fly-available locales are already installed.
@ben-allen to update the spec text to incorporate the remainder of @Manishearth's feedback.
TG2 approval: https://github.com/tc39/ecma402/blob/master/meetings/notes-2023-09-07.md#normative-added-note-about-sets-of-locales-for-web-browser-implementations-needing-to-not-change-as-a-result-of-user-behaviour-780