icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-22596 IBM-1388 converter data update

Open yumaoka opened this issue 1 year ago • 3 comments

Checklist
  • [x] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22596
  • [x] Required: The PR title must be prefixed with a JIRA Issue number.
  • [x] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • [x] Required: Each commit message must be prefixed with a JIRA Issue number.
  • [x] Issue accepted (done by Technical Committee after discussion)
  • [ ] Tests included, if applicable
  • [ ] API docs and/or User Guide docs changed or added, if applicable

yumaoka avatar Aug 09 '24 17:08 yumaoka

Notice: the branch changed across the force-push!

  • icu4c/source/data/mappings/convrtrs.txt is now changed in the branch
  • icu4j/main/charset/src/main/resources/com/ibm/icu/impl/data/icudata/cnvalias.icu is now changed in the branch
  • icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/testdata/te.res is now changed in the branch
  • icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/testdata/testtypes.res is now changed in the branch
  • icu4j/main/translit/src/main/resources/com/ibm/icu/impl/data/icudata/translit/root.res is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

There are some other ICU4J test data changes not related to the charset mapping data - te.res, testtypes.res and root.res. These files were updated when I generated ICU4J data from ICU4C tree. Eventually, these files should be updated later.

yumaoka avatar Aug 09 '24 18:08 yumaoka

@markusicu

<UE78D> \xFC\x90 |4

Should we use |1 (fallback from Unicode to code page) instead of |4 (good one way map from Unicode to code page)?

yumaoka avatar Aug 29 '24 16:08 yumaoka

There are some other ICU4J test data changes not related to the charset mapping data - te.res, testtypes.res and root.res. These files were updated when I generated ICU4J data from ICU4C tree. Eventually, these files should be updated later.

I looked at the testdata source files and don't see anything that should be affected by modifying the IBM-1388 conversion table. I can't imagine that transliteration should be affected either. Please revert these three.

markusicu avatar Sep 06 '24 17:09 markusicu

I looked at the testdata source files and don't see anything that should be affected by modifying the IBM-1388 conversion table. I can't imagine that transliteration should be affected either. Please revert these three.

When build it, ICU4J data is not fully sync'ed with ICU4C. I also understand not all resources need to be affected, but the build script handle everything in one shot. Probably someone was lazy to update ICU4C, but did not update corresponding J data. I can rebase and try it again and see if there are any diffs.

yumaoka avatar Sep 06 '24 20:09 yumaoka

When build it, ICU4J data is not fully sync'ed with ICU4C. I also understand not all resources need to be affected, but the build script handle everything in one shot. Probably someone was lazy to update ICU4C, but did not update corresponding J data. I can rebase and try it again and see if there are any diffs.

If there are diffs, they should still be unrelated to the conversion table change. I think it's best not to change unrelated files in the same PR.

markusicu avatar Sep 06 '24 20:09 markusicu

Notice: the branch changed across the force-push!

  • icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/testdata/te.res is no longer changed in the branch
  • icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/testdata/testtypes.res is no longer changed in the branch
  • icu4j/main/translit/src/main/resources/com/ibm/icu/impl/data/icudata/translit/root.res is no longer changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

I did

  • Rebased to the latest main
  • No changes to .ucm file
  • Regenerated ICU4J data for just in case. Now no diffs other than this, so unrelated .res file in ICU4J is no longer included

Please also look at https://github.com/unicode-org/icu-data/pull/41 - For this one, added the new table (not deleting the old one)

yumaoka avatar Sep 06 '24 20:09 yumaoka