contributor_covenant icon indicating copy to clipboard operation
contributor_covenant copied to clipboard

Language Tags

Open TimidRobot opened this issue 3 years ago • 1 comments
trafficstars

1) Use IETF BCP 47 language tags instead of ISO 629-2 language codes

The documentation recommends ISO 629-2 language codes:

https://github.com/EthicalSource/contributor_covenant/blob/b6b84450510006e5386a3045d80142a329a76142/README.md?plain=1#L60-L61

However, I believe those are technically insufficient. Instead I recommend using a IETF BCP 47 language tag. Thankfully, it is based on ISO 629-2 (no changes necessary). It also provides additional information, when needed. For example, If there is translation into Serbian (ISO 629-2 language code sr, you need to specify whether the Latin or Cyrillic is used--sr-latn or sr-cyrl)

IETF language tag - Wikipedia:

To distinguish language variants for countries, regions, or writing systems (scripts), IETF language tags combine subtags from other standards such as ISO 639, ISO 15924, ISO 3166-1 and UN M.49.

RFC 5646 - Tags for Identifying Languages provides a public specification.

2) Documentation leave case ambiguous

The configuration file (config.toml (permalink)) currently only contains lowercase language codes with the exception of: fa-IR فارسی (ایران) [Persian (Iran)]. To prevent confusion and unnecessary redirects, I recommend explicitly stating that lowercase language tags should be used.

3) Region vs Script

(I have the least confidence in this last recommendation.) It is my understanding that script codes better serve the global community than region codes (ex. ~~zh-cn~~ ➡️ zh-hans and ~~zh-tw~~ ➡️ zh-hant).

TimidRobot avatar Jul 06 '22 21:07 TimidRobot

Additional context for 3) Region vs Script: #18419 (Language code is not correct for Chinese) – Django

TimidRobot avatar Jul 06 '22 21:07 TimidRobot