language-data icon indicating copy to clipboard operation
language-data copied to clipboard

Make language-data available in MediaWiki core

Open winstonsung opened this issue 7 months ago • 6 comments

We would like to bring language-data to MediaWiki core.

Questions

Should we move this repository to Gerrit/GitLab?

Reason for Gerrit:

  • We could easily make the list of users with CR +2 rights the same with mediawiki/core on Gerrit.
  • ~~It would be hard for integration of Depends-On on different platforms.~~
    (There's no CI injection/dependency feature for libraries in Wikimedia Gerrit.)

Reason for GitLab: Contributors aren't required to accept third party privacy policies.


The reason it should be under Gerrit instead of GitLab is due to the decision of the project layout.

This repository shold fall under mediawiki/libs (i.e., named as mediawiki/libs/LanguageData and included in /vendor in mediawiki/core) as it should contain PHP codes, and all mediawiki/libs/ projects were on Gerrit while none of them were on GitLab.

https://www.mediawiki.org/wiki/GitLab/Migration_status


Should composer.json be exported?

Looks like we need composer.json to be exported, should it be removed from .gitattributes export-ignore?

https://gerrit.wikimedia.org/r/c/mediawiki/vendor/+/1056254

Nikki wrote:

The language-data format doesn't support all the data they have (multiple scripts, Wikidata IDs, English names, parent language/families, etc), and requires data that is hard to get (autonyms), I think it would need big changes if it's ever going to be useful for things other than selecting a MediaWiki interface language.

Considerations

  • BCP 47 Language/script/region/variant subtags
  • ISO codes
    • NOTE: This is actually different from BCP 47 subtags.
  • MediaWiki internal language codes
  • Wikidata IDs
  • WikiLambda ZID
  • Autonyms (language name written in its local writing system)
  • The script in which a language is written
    • Multiple scripts
  • The regions in which the language is spoken/written
  • Translations of language names
    • English names
  • Language fallback chains
  • Parent language/families
  • The writing mode of the text
    • The directionality of the text
    • The writing-mode property of the text
  • Time formats

Bug: T190129

winstonsung avatar Jul 24 '24 12:07 winstonsung