ecma402 icon indicating copy to clipboard operation
ecma402 copied to clipboard

Confusing description of IsStructurallyValidLanguageTag() operation

Open iamstolis opened this issue 5 years ago • 6 comments
trafficstars

IsStructurallyValidLanguageTag() operation states that it

verifies that the locale argument represents a well-formed "Unicode BCP 47 locale identifier" as specified in Unicode Technical Standard 35 section 3.2,

It also says that

The abstract operation returns true if locale can be generated from the EBNF grammar in section 3.2 of the Unicode Technical Standard 35, starting with unicode_locale_id, and does not contain duplicate variant or singleton subtags (other than as a private use subtag). It returns false otherwise.

These requirements are inconsistent. The mentioned grammar does not describe "Unicode BCP 47 locale identifier". It describes "Unicode CLDR locale identifier", i.e. it describes identifiers that support some backward compatibility syntax (root subtag, underscores as separator, tags starting with script subtag) that is not allowed in "Unicode BCP 47 locale identifier", see Unicode Technical Standard 35 section 3.3. Please, fix/improve the description of this operation such that it is clear what this operation is supposed to do.

iamstolis avatar Apr 05 '20 19:04 iamstolis

@anba @jswalden @FrankYFTang

sffc avatar Apr 06 '20 06:04 sffc

It should refer to Unicode BCP 47 locale identifier - not the CLDR one.

zbraniecki avatar Apr 07 '20 18:04 zbraniecki

I don't really see the inconsistency. The grammar describes Unicode locale identifiers in general, and Unicode BCP 47 locale identifiers are a subset of this, as described in section 3.3. But the grammar is in section 3.2; there's no other grammar we could refer to. Should we link directly to section 3.3 in addition for clarity?

littledan avatar May 02 '20 09:05 littledan

I don't really see the inconsistency.

I find it hard not to see the inconsistency when one sentence says that the operation verifies that the argument

represents a well-formed "Unicode BCP 47 locale identifier"

and the following sentence says the equivalent of: The operation returns true if the argument is "Unicode CLDR locale identifier" (and returns false otherwise).

The paragraph speaking about returning true/false based on unicode_locale_id grammar is incorrect and should be removed or reworded. Note that the main motivation behind this issue is the fact that we had this operation implemented incorrectly (based on this incorrect paragraph, i.e., following the grammar only => accepting CLDR not BCP 47 version) until I was pointed by @anba what is the intended meaning of IsStructurallyValidLanguage().

But the grammar is in section 3.2; there's no other grammar we could refer to. Should we link directly to section 3.3 in addition for clarity?

I see it in the opposite way: the term "Unicode BCP 47 locale identifier" is defined in section 3.3 (and not mentioned at all in section 3.2). So, section 3.3 is the more important section to point to. Of course, the grammar in section 3.2 is crucial for the understanding of the definition. Hence, linking section 3.2 in addition (for clarity) is a very good idea.

iamstolis avatar May 02 '20 20:05 iamstolis

FWIW https://github.com/tc39/ecma402/pull/429 would probably also resolve this by incidentally enumerating exact validity criteria and linking to both sections 3.2 and 3.3.

jswalden avatar May 03 '20 09:05 jswalden

@iamstolis @jswalden I added a note in #431 that should ideally clear up the confusion. Could you please take a quick look at that PR?

ryzokuken avatar May 03 '20 12:05 ryzokuken