CLDR-11888 Update French speakers
https://unicode-org.atlassian.net/browse/CLDR-11888 was created to update the French speakers for Djibouti but while I was researching that I found many other Francophone countries that significantly underestimated French populations. Most of those gaps probably come from the number being L1 users but the point of this file is L1+L2 users -- basically how many people in each country could use an interface in this language.
See the original data in: https://www.francophonie.org/sites/default/files/2021-04/LFDM-20Edition-2019-La-langue-fran%C3%A7aise-dans-le-monde.pdf
CLDR-11888
- [x] This PR completes the ticket.
- [x] mvn package -DskipTests=true
- [x] java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData
- [x] java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelySubtags
ALLOW_MANY_COMMITS=true
Notice: the branch changed across the force-push!
- common/supplemental/likelySubtags.xml is now changed in the branch
~ Your Friendly Jira-GitHub PR Checker Bot
Changing the merge target to v47 and also will investigate making the document more stable -- I'll pursue these population count changes later.
My assumption is that the likelySubtags changes are caused by running the tool. Is that correct?
Yea, from running java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData
Changing the merge target to v47 and also will investigate making the document more stable -- I'll pursue these population count changes later.
you need to also rebase it on v47.
I've updated the v47 branch.
I think https://www.francophonie.org/sites/default/files/2021-04/LFDM-20Edition-2019-La-langue-fran%C3%A7aise-dans-le-monde.pdf is overstating the case for French. I've seen the same mistakes show up for other sources over-estimating English competency (eg in Mauritius), so francophones are certainly not alone!
Our aim for the language population is to provide two figures, the overall population of reasonably competent L1+L2 speakers, and the "literacy" percentage of those. The 'literacy' percentage should really reflect usage, being a proxy for something like 'weekly active readers of the language'. When a language is written in multiple scripts, then it should also reflect the script usage. For example, if the language xx could be written in both Latin and Cyrillic, we'd expect to see xx_Latn and xx_Cyrl (one of them would be just xx, if that script is the default for xx globally).
The 'competency' is very roughly "if the person were literate, could they read and understand an application UI in the language, including help messages, instructions for usage, etc."
Now, we rarely get a lot of detailed information about language capabilities; sometimes the figures we see include just L1, sometimes also L2, and typically doesn't say which. So we have to use a fair amount of judgement in assessing various reports. For languages that are rarely written, such as Swiss German, in the absence of good information we tend to get a literacy value of 5%.
On Thu, Aug 29, 2024 at 9:33 AM Steven R. Loomis @.***> wrote:
I've updated the v47 branch.
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/cldr/pull/3985#issuecomment-2318310500, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMB6P7UJJRSYDUSQCZDZT5EM7AVCNFSM6AAAAABNGTDWNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJYGMYTANJQGA . You are receiving this because your review was requested.Message ID: @.***>
still needs rebase to reflect the branch change.
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot
Notice: the branch changed across the force-push!
- common/supplemental/likelySubtags.xml is different
- common/supplemental/supplementalData.xml is different
- common/testData/localeIdentifiers/likelySubtags.txt is now changed in the branch
- common/testData/localeIdentifiers/localeDisplayName.txt is now changed in the branch
- tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/country_language_population.tsv is different
~ Your Friendly Jira-GitHub PR Checker Bot
Thanks @macchiati for checking the data -- yea it definitely looks suspect. I was just re-basing this diff since it was on a pretty stale branch. I'll need to interrogate the data better. That Swiss government source looks great.
Notice: the branch changed across the force-push!
- common/supplemental/likelySubtags.xml is different
- common/supplemental/supplementalData.xml is different
- common/testData/localeIdentifiers/likelySubtags.txt is different
- common/testData/localeIdentifiers/localeDisplayName.txt is no longer changed in the branch
- tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/country_language_population.tsv is different
~ Your Friendly Jira-GitHub PR Checker Bot