cldr CLDR-11888 Update French speakers

https://unicode-org.atlassian.net/browse/CLDR-11888 was created to update the French speakers for Djibouti but while I was researching that I found many other Francophone countries that significantly underestimated French populations. Most of those gaps probably come from the number being L1 users but the point of this file is L1+L2 users -- basically how many people in each country could use an interface in this language.

See the original data in: https://www.francophonie.org/sites/default/files/2021-04/LFDM-20Edition-2019-La-langue-fran%C3%A7aise-dans-le-monde.pdf

CLDR-11888

[x] This PR completes the ticket.
[x] mvn package -DskipTests=true
[x] java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData
[x] java -jar tools/cldr-code/target/cldr-code.jar GenerateLikelySubtags

ALLOW_MANY_COMMITS=true

Aug 27 '24 17:08 conradarcturus

Notice: the branch changed across the force-push!

common/supplemental/likelySubtags.xml is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Aug 27 '24 18:08 jira-pull-request-webhook[bot]

Changing the merge target to v47 and also will investigate making the document more stable -- I'll pursue these population count changes later.

Aug 28 '24 16:08 conradarcturus

My assumption is that the likelySubtags changes are caused by running the tool. Is that correct?

Yea, from running java -jar tools/cldr-code/target/cldr-code.jar ConvertLanguageData

Aug 28 '24 16:08 conradarcturus

Changing the merge target to v47 and also will investigate making the document more stable -- I'll pursue these population count changes later.

you need to also rebase it on v47.

Aug 29 '24 16:08 srl295

I've updated the v47 branch.

Aug 29 '24 16:08 srl295

I think https://www.francophonie.org/sites/default/files/2021-04/LFDM-20Edition-2019-La-langue-fran%C3%A7aise-dans-le-monde.pdf is overstating the case for French. I've seen the same mistakes show up for other sources over-estimating English competency (eg in Mauritius), so francophones are certainly not alone!

Our aim for the language population is to provide two figures, the overall population of reasonably competent L1+L2 speakers, and the "literacy" percentage of those. The 'literacy' percentage should really reflect usage, being a proxy for something like 'weekly active readers of the language'. When a language is written in multiple scripts, then it should also reflect the script usage. For example, if the language xx could be written in both Latin and Cyrillic, we'd expect to see xx_Latn and xx_Cyrl (one of them would be just xx, if that script is the default for xx globally).

The 'competency' is very roughly "if the person were literate, could they read and understand an application UI in the language, including help messages, instructions for usage, etc."

Now, we rarely get a lot of detailed information about language capabilities; sometimes the figures we see include just L1, sometimes also L2, and typically doesn't say which. So we have to use a fair amount of judgement in assessing various reports. For languages that are rarely written, such as Swiss German, in the absence of good information we tend to get a literacy value of 5%.

On Thu, Aug 29, 2024 at 9:33 AM Steven R. Loomis @.***> wrote:

I've updated the v47 branch.

— Reply to this email directly, view it on GitHub https://github.com/unicode-org/cldr/pull/3985#issuecomment-2318310500, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMB6P7UJJRSYDUSQCZDZT5EM7AVCNFSM6AAAAABNGTDWNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJYGMYTANJQGA . You are receiving this because your review was requested.Message ID: @.***>

Aug 29 '24 17:08 macchiati

still needs rebase to reflect the branch change.

Sep 09 '24 17:09 srl295

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

Oct 25 '24 17:10 jira-pull-request-webhook[bot]

Notice: the branch changed across the force-push!

common/supplemental/likelySubtags.xml is different
common/supplemental/supplementalData.xml is different
common/testData/localeIdentifiers/likelySubtags.txt is now changed in the branch
common/testData/localeIdentifiers/localeDisplayName.txt is now changed in the branch
tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/country_language_population.tsv is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Oct 29 '24 17:10 jira-pull-request-webhook[bot]

Thanks @macchiati for checking the data -- yea it definitely looks suspect. I was just re-basing this diff since it was on a pretty stale branch. I'll need to interrogate the data better. That Swiss government source looks great.

Oct 30 '24 14:10 conradarcturus

Notice: the branch changed across the force-push!

common/supplemental/likelySubtags.xml is different
common/supplemental/supplementalData.xml is different
common/testData/localeIdentifiers/likelySubtags.txt is different
common/testData/localeIdentifiers/localeDisplayName.txt is no longer changed in the branch
tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/country_language_population.tsv is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Nov 05 '24 00:11 jira-pull-request-webhook[bot]