ICU-22834 MF2: make tests compliant with schema and update spec tests
This PR changes the code in the ICU4C and ICU4J tests that reads data-driven tests in JSON format to follow the test schema in the conformance repo, as well as updating the spec tests to the version in the spec repo.
Due to the changed spec tests, some non-test code changes were necessary to get the tests to pass:
- ICU4C: Fixed number literal parsing in
Number::format(this wasn't being done according to spec before, and previously there were no tests to expose that) - ICU4C: Fixed parsing of
.inputdeclarations, which had assumed the annotation can't be a reserved annotation - ICU4C: Fixed bug where
.iwas a parse error rather than an unsupported keyword - ICU4C: Make parsing of text and escape characters consistent with spec
- ICU4C: Allow markup with space after the initial '{'
- ICU4C: Check for duplicate variant errors (which were recently added to the spec; ICU4J already checked for these errors)
- ICU4J: Allow trailing whitespace after complex messages (recent spec change)
- ICU4J: Treat whitespace after
.inputas optional - ICU4J: Don't format unannotated number literals as numbers
- Both: Allow leading whitespace before complex messages (also a recent spec change)
Note that some manual changes to the spec tests are necessary; ICU4J currently returns best-effort output rather than throwing exceptions for resolution errors, so tests where a resolution error (e.g. unknown-function or unsupported-statement) is expected need to be annotated with an ignoreJava property. This can be removed once the error handling story has been resolved (see MFWG issue 782).
The more interesting changes needed to parse the test files according to the schema include:
- Updating error names according to schema
- Updating how test params are specified (as an array of objects with
nameandvalueproperties, rather than as an object) - Eliminating the
srcsproperty to represent multi-line messages, and instead, allowingsrcto be either a single string or array of strings (consistent with this schema change) - Handling default test properties
This also made it possible to move all the test filenames into one file in the ICU4J tests (CoreTest.java) and delete the others, since tests are handled uniformly now; analogously, removing extra methods in the ICU4C test reader.
Finally, the non-spec tests needed to be changed themselves; the main changes were to change the format of the params property (mentioned above) and changing errors to expErrors. In addition, one file (icu-test-selectors.json) needed to be "expanded out" because the schema doesn't support the shared and variations properties.
Checklist
- [x] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22834
- [x] Required: The PR title must be prefixed with a JIRA Issue number.
- [x] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
- [ ] Required: Each commit message must be prefixed with a JIRA Issue number.
- [ ] Issue accepted (done by Technical Committee after discussion)
- [x] Tests included, if applicable
- [x] API docs and/or User Guide docs changed or added, if applicable
Notice: the branch changed across the force-push!
- icu4c/source/i18n/messageformat2_parser.h is different
- testdata/message2/reserved-syntax.json is different
~ Your Friendly Jira-GitHub PR Checker Bot
Notice: the branch changed across the force-push!
- testdata/message2/reserved-syntax.json is different
~ Your Friendly Jira-GitHub PR Checker Bot
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot
Notice: the branch changed across the force-push!
- icu4j/main/core/src/main/java/com/ibm/icu/message2/MFParser.java is different
- icu4j/main/core/src/main/java/com/ibm/icu/message2/StringUtils.java is now changed in the branch
- testdata/message2/reserved-syntax-2.json is now changed in the branch
~ Your Friendly Jira-GitHub PR Checker Bot
Notice: the branch changed across the force-push!
- icu4c/source/config/dist.mk is no longer changed in the branch
- icu4c/source/i18n/messageformat2_data_model.cpp is no longer changed in the branch
- icu4c/source/i18n/messageformat2_function_registry.cpp is different
- icu4c/source/i18n/messageformat2_parser.cpp is different
- icu4c/source/i18n/messageformat2_parser.h is no longer changed in the branch
- icu4c/source/i18n/messageformat2_serializer.cpp is no longer changed in the branch
- icu4c/source/i18n/unicode/messageformat2_data_model.h is no longer changed in the branch
- icu4c/source/test/intltest/intltest.cpp is no longer changed in the branch
- icu4c/source/test/intltest/intltest.h is no longer changed in the branch
- icu4c/source/test/testdata/message2/duplicate-declarations.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/icu4j/icu-parser-tests.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/invalid-options.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/markup.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/more-data-model-errors.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/README.txt is no longer changed in the branch
- icu4c/source/test/testdata/message2/reserved-syntax.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/resolution-errors.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/runtime-errors.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/spec/data-model-errors.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/spec/syntax-errors.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/spec/test-core.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/spec/test-functions.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/tricky-declarations.json is no longer changed in the branch
- icu4c/source/test/testdata/message2/valid-tests.json is no longer changed in the branch
- icu4j/main/common_tests/src/test/java/com/ibm/icu/dev/test/message2/Mf2FeaturesTest.java is no longer changed in the branch
- icu4j/main/core/src/main/java/com/ibm/icu/message2/InputSource.java is no longer changed in the branch
- icu4j/main/core/src/main/java/com/ibm/icu/message2/MFDataModelFormatter.java is no longer changed in the branch
- icu4j/main/core/src/main/java/com/ibm/icu/message2/MFParser.java is different
- icu4j/main/core/src/main/java/com/ibm/icu/message2/StringUtils.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/CoreTest.java is different
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/CustomFormatterPersonTest.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/DataModelErrorsTest.java is different
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/FunctionsTest.java is different
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/IcuFunctionsTest.java is different
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/MessageFormat2Test.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/SerializationTest.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/SyntaxErrorsTest.java is different
- icu4j/main/core/src/test/resources/com/ibm/icu/dev/test/message2/data-model-errors.json is no longer changed in the branch
- icu4j/main/core/src/test/resources/com/ibm/icu/dev/test/message2/icu-parser-tests.json is no longer changed in the branch
- icu4j/main/core/src/test/resources/com/ibm/icu/dev/test/message2/icu-test-previous-release.json is no longer changed in the branch
- icu4j/main/core/src/test/resources/com/ibm/icu/dev/test/message2/icu-test-selectors.json is no longer changed in the branch
- icu4j/main/core/src/test/resources/com/ibm/icu/dev/test/message2/syntax-errors.json is no longer changed in the branch
- icu4j/main/core/src/test/resources/com/ibm/icu/dev/test/message2/test-core.json is no longer changed in the branch
- icu4j/main/core/src/test/resources/com/ibm/icu/dev/test/message2/test-functions.json is no longer changed in the branch
- icu4j/pom.xml is no longer changed in the branch
- icu4j/releases_tools/github_release.sh is no longer changed in the branch
- testdata/message2/icu-test-functions.json is different
- testdata/message2/more-syntax-errors.json is now changed in the branch
- testdata/message2/reserved-syntax-2.json is no longer changed in the branch
- testdata/message2/reserved-syntax.json is different
- testdata/message2/spec/test-functions.json is now changed in the branch
- testdata/message2/syntax-errors-diagnostics.json is different
- testdata/message2/valid-tests.json is different
~ Your Friendly Jira-GitHub PR Checker Bot
Notice: the branch changed across the force-push!
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/Sources.java is different
~ Your Friendly Jira-GitHub PR Checker Bot
Notice: the branch changed across the force-push!
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/CoreTest.java is different
- testdata/message2/icu-test-functions-multiline.json is no longer changed in the branch
- testdata/message2/icu-test-functions.json is different
- testdata/message2/more-functions-multiline.json is no longer changed in the branch
- testdata/message2/more-functions.json is different
~ Your Friendly Jira-GitHub PR Checker Bot
I can't request reviews, but: @markusicu @srl295 @echeran - and @mihnita might want to take a look at the Java changes.
I added another commit to this PR to update the spec tests again, which meant I had to pull in some more commits that were initially in https://github.com/unicode-org/icu/pull/3092 (draft PR) in order to make the tests pass.
Also, I'll be on vacation from now until September 9, so I won't see any review comments until I return.
as a followon i'd move the number parsing into ufmt_cmn.cpp
as a followon i'd move the number parsing into ufmt_cmn.cpp
See above comment about introducing a circular dependency between io and i18n.
There is an email thread on how to organize the test data. Where the idea is to have some kind of shared folder in the root of the repo, with separate sub-folders for testdata ICU only and testdata from cldr.
Because some of the json files here are from cldr "as is" (or at least they should be). And some are ICU only, shared between C++/Java, but not from cldr.
The cldr test files are copied to icu by an ant script.
If you don't know what thread I'm talking about (with vacation and all it is understandable if it was lost), tell me and I'll ping you by email with a link.
@mihnita:
There is an email thread on how to organize the test data. Where the idea is to have some kind of shared folder in the root of the repo, with separate sub-folders for testdata ICU only and testdata from cldr.
Do you think it would be OK to handle that in a separate PR?
@mihnita:
Am I wrong, of there was another PR that was kind of similar, but not really?
Yes, there's a draft PR that I intend as a follow-up to this one: https://github.com/unicode-org/icu/pull/3092
Rebasing now, since I think all the line comments have been addressed.
Notice: the branch changed across the force-push!
- icu4c/source/common/unicode/char16ptr.h is no longer changed in the branch
- icu4c/source/common/unicode/platform.h is no longer changed in the branch
- icu4c/source/common/unicode/unistr.h is no longer changed in the branch
- icu4c/source/common/uniset_props.cpp is no longer changed in the branch
- icu4c/source/common/unistr.cpp is no longer changed in the branch
- icu4c/source/i18n/messageformat2_function_registry.cpp is different
- icu4c/source/i18n/number_decimalquantity.cpp is no longer changed in the branch
- icu4c/source/test/intltest/ustrtest.cpp is no longer changed in the branch
- icu4c/source/test/intltest/ustrtest.h is no longer changed in the branch
- icu4j/main/core/src/main/java/com/ibm/icu/message2/MFDataModelFormatter.java is different
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/IcuFunctionsTest.java is different
~ Your Friendly Jira-GitHub PR Checker Bot
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot
@echeran It looks like the "enforce-all-checks" check failed, but that was the only check that failed. I think this is the same issue described in the icu-team email thread from yesterday?
@echeran It looks like the "enforce-all-checks" check failed, but that was the only check that failed. I think this is the same issue described in the icu-team email thread from yesterday?
retriggered, clean now. The ironic thing is that I was wondering if CLDR needs one of the same enforce-all-checks but now i'm not so sure!
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot
@echeran Squashed -- I think you don't need to reapprove since nothing changed, but if you could click the merge button, that would be great!
Thank you! M.