hermes icon indicating copy to clipboard operation
hermes copied to clipboard

Add Intl.Collator implementation using ICU4C

Open robchu05 opened this issue 1 year ago • 11 comments

Summary

Add Intl.Collator implementation using ICU4C for non-Android / non-Apple platforms.

Classes without ICU / Platform API dependency:

  • Constants - String constants for options bag’s names and values, valid option values, other constants.
  • IntlUtils - Conversion between UTF-8 and UTF-16 ASCII strings, convert string to bool, i.e. case- insensitive match against “true”, lowercase ASCII strings.
  • LocaleBCP47Object - Wrapper over a BCP47Parser::ParsedLocaleIdentifier, provides factory method to create instances from a BCP47 locale string, methods to get canonicalized locale string and locale string without extensions, and implements the CanonicalizeLocaleList() spec operation.
  • OptionHelpers - Gets string, bool, and number option from options bag.

ICU dependent classes:

  • Collator - Intl.Collator implementation using ICU collator API.
  • Locale - Conversion between BCP47 and ICU locale strings.
  • LocaleResolver - Implements ResolveLocale() and SupportedLocales() spec operations, supports both “lookup” and “best fit” locale matching. “best fit” locale matching uses ICU acceptLanguage API.

Fix memory leaks in existing DateTimeFormat implementation in PlatformIntlICU.cpp by wrapping created ICU object pointers in unique_ptr.

Migrate ResolveLocale, SupportedLocalesOf, GetCanonicalLocales API implementation to use the corresponding newly added supporting classes.

Add test cases to verify invalid locales and options input would result in throwing JS exception in collation.js. Add tests cases to verify resolution of locale extensions and options in a new file collation-resolved-options.js.

Test Plan

Build succeeds on Ubuntu 20.04. Run the following JS tests on Ubuntu 20.04, and all pass except for a few test cases in date-time-format-apple.js due to locale data differences. The output of running date-time-format-apple.js test is the same as before this change.

$ ./build/bin/hermes ./hermes/test/hermes/intl/intl.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/get-canonical-locales.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/get-canonical-locales.js
$ ./build/bin/hermes ./hermes/test/hermes/intl/collator.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/collator.js
$ LC_ALL=fr_FR _HERMES_TEST_LOCALE=fr_FR ./build/bin/hermes ./hermes/test/hermes/intl/collator-resolved-options.js
$ TZ=GMT ./build/bin/hermes ./hermes/test/hermes/intl/date-time-format-apple.js | ./build/bin/FileCheck --match-full-lines ./hermes/test/hermes/intl/date-time-format-apple.js

Run valgrind with above test runs and no memory leak detected.

Run Test262 tests for Collator, String-LocaleCompare, DateTimeFormat, Date, and Intl. All non-skipped test cases pass.

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Collator/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |       61 |
| Pass                 |       45 |
| Fail                 |        0 |
| Skipped              |       16 |
| Permanently Skipped  |        0 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/String/prototype/localeCompare/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |       10 |
| Pass                 |        9 |
| Fail                 |        0 |
| Skipped              |        1 |
| Permanently Skipped  |        0 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/DateTimeFormat/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |      176 |
| Pass                 |       50 |
| Fail                 |        0 |
| Skipped              |      125 |
| Permanently Skipped  |        1 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Date/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |       12 |
| Pass                 |       10 |
| Fail                 |        0 |
| Skipped              |        2 |
| Permanently Skipped  |        0 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

$ ./hermes/utils/testsuite/run_testsuite.py -b ./build/bin --test-intl -a ~/Downloads/test262/test/intl402/Intl/
...
-----------------------------------
| Results              |   PASS   |
|----------------------+----------|
| Total                |       66 |
| Pass                 |       20 |
| Fail                 |        0 |
| Skipped              |       19 |
| Permanently Skipped  |       27 |
| Pass Rate            |  100.00% |
-----------------------------------
| Failures             |          |
|----------------------+----------|
| Compile fail         |        0 |
| Compile timeout      |        0 |
| Execute fail         |        0 |
| Execute timeout      |        0 |
-----------------------------------

robchu05 avatar May 30 '24 21:05 robchu05

Hi @robchu05!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot avatar May 30 '24 21:05 facebook-github-bot

Thank you for putting this together!

I haven't gone into the details yet, but wanted to first share some high level organisational comments. In particular, I'm wary of cluttering the Platform/Intl directory with too many files that are specific to this ICU implementation. It looks like there are also dependencies going both ways between the Platorm/Intl directory and Platform/Intl/icu_impl directory, so it's difficult to reason about how things are organised.

My preference in this instance would be to flatten more of the internals into PlatformIntlICU.cpp (in the same way that PlatformIntlApple.mm is self-contained), which will reduce the need for things like Constants.h. Note that we have many files in Hermes where classes that provide internal functionality live only in the relevant cpp file, so there is no hard rule requiring them to be in separate files.

Where you feel that doesn't make sense, the extra files should all be in icu_impl, so it is clear that they exist only to support the ICU implementation.

neildhar avatar Jun 13 '24 16:06 neildhar

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

facebook-github-bot avatar Jun 13 '24 19:06 facebook-github-bot

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

facebook-github-bot avatar Jun 13 '24 19:06 facebook-github-bot

Hi @neildhar,

Thanks for the code organization feedback. I agree with the comments on cluttering Platform/Intl and dependencies going both ways between Platform/Intl and Platform/Intl/impl_icu at this point in time.

My intention is to place the files with no ICU dependency in Platform/Intl so that they can be used in / shared with the Apple implementation in the future, not just for the ICU implementation. Also, in the future, I would like to refactor much more code blocks that aren't dependent on ICU out from Platform/Intl/impl_icu to Platform/Intl to share with the Apple implementation. This may not be the case if it is decided to change the Apple implementation to route the Intl calls to JSC per https://github.com/facebook/hermes/discussions/1211.

Since we aren't yet consolidating implementation across platforms, the code organization in this revision does look wonky. Rather than having this code organization split currently, how about I move all the new implementation files into Platform/Intl/impl_icu for the time being, until we are ready to start on the implementation consolidation work?

Regarding flatten more of the internals into PlatformIntlICU.cpp, I am concerned that the file will grow to be tens of thousands of lines as more and eventually complete coverage of Intl APIs are implemented, which will become hard to manage and continue development. Thus, I would prefer organizing into separate files.

robchu05 avatar Jul 08 '24 18:07 robchu05

@robchu05 Sure, if you feel it makes it easier to keep things split, that's fine too. My comment was primarily about keeping the ICU implementation internals together.

neildhar avatar Jul 12 '24 16:07 neildhar

@tmikov, @neildhar, friendly reminder to review this PR. Appreciate your feedback and help in guiding me through.

robchu05 avatar Aug 01 '24 21:08 robchu05

Hi @neildhar,

Thanks for the review and feedback. I target to address them and post a new revision by end of next week.

robchu05 avatar Aug 22 '24 18:08 robchu05

Hi @neildhar,

I updated the PR to address your feedback. Please have a look 😀

I would also like to get your inputs on the tests.

The CircleCI test failure on test-macos-test262 job is on the additional test cases that I added in test/hermes/intl/. I can work on fixing PlatformIntlApple for a few test cases. For other cases that are not straightforward to fix on Apple and are not critical, we can consider splitting the Hermes intl tests so that one set is run for mac and one set is run for linux.

Another thing to consider is to add a test-linux-test262 job similar to test-macos-test262 on CircleCI. I think that would involve adding a different set of intl test skip list for the test262 test suite because the intl coverage on linux with icu4c is not yet matching Apple platform and some intl test cases in current skip list (for Apple) pass, thus do not need to be skipped, on linux.

Appreciate your thoughts on the above! Thanks!

robchu05 avatar Aug 31 '24 00:08 robchu05

Hey @robchu05 thanks for updating the PR, I haven't forgotten about this, I've been travelling for the last few weeks and will take a look once I'm back next week.

neildhar avatar Sep 10 '24 15:09 neildhar

Thank you for letting me know, @neildhar. Hope you are having a pleasant and safe travel.

robchu05 avatar Sep 10 '24 17:09 robchu05

Hi @neildhar, hope you had a fun trip! Friendly reminder to review the new revision and also the testing update suggestions in my earlier comment. Thanks.

robchu05 avatar Sep 23 '24 23:09 robchu05

Hi @neildhar,

Please have a look at the latest changes, starting with the 4th commit "Fix handling of negative test cases and update test configurations". The commits after that are small fix-ups to circle-ci configuration to address run failures. Thank you!

Regarding the windows and test-windows job failures, they are due to compilation errors:

C:\tmp\hermes\hermes\unittests\Support\StackBoundsTest.cpp(35,14): error C3861: '__builtin_frame_address': identifier not found [C:\tmp\hermes\build\unittests\Support\HermesSupportTests.vcxproj]
C:\tmp\hermes\hermes\unittests\Support\StackBoundsTest.cpp(43,14): error C3861: '__builtin_frame_address': identifier not found [C:\tmp\hermes\build\unittests\Support\HermesSupportTests.vcxproj]

Error log line link: https://app.circleci.com/pipelines/github/facebook/hermes/6944/workflows/4df96b47-12a8-42a4-a9f7-5c095d23252f/jobs/71264?invite=true#step-106-84684_132

The file StackBoundsTest.cpp with the compilation error is recently introduced in https://github.com/facebook/hermes/commit/a5b2bbcf33cf7f475b3d7dce96be2371eb5fda0b.

robchu05 avatar Oct 17 '24 20:10 robchu05

@robchu05 sorry about that, by bad. I just added that test and apparently our CI didn't flag it as broken on Windows.

tmikov avatar Oct 18 '24 00:10 tmikov

@neildhar has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Oct 23 '24 16:10 facebook-github-bot

@robchu05 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Oct 23 '24 17:10 facebook-github-bot

@robchu05 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Oct 23 '24 18:10 facebook-github-bot

@robchu05 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Oct 23 '24 19:10 facebook-github-bot

@robchu05 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Oct 23 '24 19:10 facebook-github-bot

Hi @neildhar, anything else needed for merging this pull request?

robchu05 avatar Oct 29 '24 22:10 robchu05

@neildhar has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Oct 29 '24 23:10 facebook-github-bot

@neildhar , any edits I need to make to address the Facebook Internal failures?

robchu05 avatar Oct 31 '24 21:10 robchu05

@robchu05 Looks like they're just formatting issues, I'll fix those internally before committing.

neildhar avatar Nov 01 '24 14:11 neildhar

Hi @neildhar, will you be merging this PR some time this week?

robchu05 avatar Nov 05 '24 23:11 robchu05

@neildhar merged this pull request in facebook/hermes@a84b2f9b2f5a8bcd9341ad8e50bbd3eb1debbd82.

facebook-github-bot avatar Nov 07 '24 12:11 facebook-github-bot