DRAFT: MF2: fix various parser bugs and add more tests
DRAFT wip, do not review, depends on https://github.com/unicode-org/icu/pull/3063
Checklist
- [ ] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-_____
- [ ] Required: The PR title must be prefixed with a JIRA Issue number.
- [ ] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
- [ ] Required: Each commit message must be prefixed with a JIRA Issue number.
- [ ] Issue accepted (done by Technical Committee after discussion)
- [ ] Tests included, if applicable
- [ ] API docs and/or User Guide docs changed or added, if applicable
Sorry, the sharing of the test files PR started when I was in vacation and somehow didn't get on my radar when I got back.
But I just found that the change broke Eclipse, and that now the json test files are packaged in the release jar files of ICU.
I general having Maven consume files from outside it's own folder tree is hacky and error prone. And when it finally works, we discover that it breaks the import to various IDEs. I think that we would want to support at least Eclipse, IntelliJ, VS Code.
We currently have that problem with the LICENSE file (I can't import icu4j in Eclipse anymore) I'm working on that...
The other problem with sharing is that it forces the c/c++ and Java implementations to be 100% in sync at all times. Having them work the same is in general a good thing, of course.
But becomes a PITA when something changes and we need to update the code. Because (often) the C++ and Java devs might not be the same.
This PR being an example.
TLDR: I am tempted to keep two copies of the test data. There is an ant task in tools/cldr/ that (copy-cldr-testdata) that copies some test data from CLDR.
These json files are probably in the same bucket: they live in CLDR, but must be tested against in ICU.
But the idea is that the ant task can copy the files in two places, icu4c and icu4j. If code breaks, but one has time / expertise for C/C++ only, they can open a ticket against icu4j, priority zero, but revert the test files in icu4j until that issue is fixed.
I don't know if that is a good idea or not. But this is what we already do for other tests.
Here is a fragment of the ant script:
<fileset id="cldrTestData" dir="${cldrDir}/common/testData">
<!-- Add directories here to control which test data is installed. -->
<include name="localeIdentifiers/**"/> <!-- ... -->
<include name="personNameTest/**"/> <!-- Used in ExhaustivePersonNameTest -->
<include name="units/**"/> <!-- Used in UnitsTest tests -->
</fileset>
<copy todir="${testDataDir4C}">
<fileset refid="cldrTestData"/>
</copy>
<copy todir="${testDataDir4J}">
<fileset refid="cldrTestData"/>
</copy>
We copy the same test files in icu4c/source/test/testdata/cldr and icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/cldr
Try:
diff -r icu4c/source/test/testdata/cldr icu4j/main/core/src/test/resources/com/ibm/icu/dev/data/cldr
There are 3 sub-folders there: localeIdentifiers, personNameTest, units
I would propose we add another one (messageformat2) and do the same thing.
Sorry, the sharing of the test files PR started when I was in vacation and somehow didn't get on my radar when I got back.
But I just found that the change broke Eclipse, and that now the json test files are packaged in the release jar files of ICU.
I general having Maven consume files from outside it's own folder tree is hacky and error prone. And when it finally works, we discover that it breaks the import to various IDEs. I think that we would want to support at least Eclipse, IntelliJ, VS Code.
Of course; is there a way to add automated testing that changes don't break IDEs?
We currently have that problem with the LICENSE file (I can't import icu4j in Eclipse anymore) I'm working on that...
The other problem with sharing is that it forces the c/c++ and Java implementations to be 100% in sync at all times. Having them work the same is in general a good thing, of course.
But becomes a PITA when something changes and we need to update the code. Because (often) the C++ and Java devs might not be the same.
This PR being an example.
TLDR: I am tempted to keep two copies of the test data. There is an ant task in tools/cldr/ that (copy-cldr-testdata) that copies some test data from CLDR.
I don't want to unilaterally say yes or no to this; is it something to discuss in the TC meeting, maybe? (Note: I'll be on vacation from early next week until September 2, so I won't be at the next few meetings, but I can go back and read minutes later.)
First, thank you very much for taking a stab at updating the Java implementation to the latest spec. I was planning to do that, but it looks like trying to share the test files precipitated things somewhat :-)
Right, first it was precipitated by sharing test files; then I wrote a random test generator and decided it was only fair to run ICU4J on some of the generated tests as well as ICU4C, so that's where some of the other changes came from.
I added some comments.
But if there is a decisions to treat the mf2 test files the same way we treat other CLDR test files (units, people names, locale ids) and separate the C++ / Java tests, I can take over the Java part. And in a different PR.
Certainly (like I said in another comment, I think that has to be discussed with the TC).
I think I've addressed all your comments, but let me know if I missed anything!
Notice: the branch changed across the force-push!
- icu4c/source/i18n/messageformat2_data_model.cpp is no longer changed in the branch
- icu4c/source/i18n/messageformat2_function_registry.cpp is no longer changed in the branch
- icu4c/source/i18n/messageformat2_macros.h is no longer changed in the branch
- icu4c/source/i18n/messageformat2_parser.cpp is different
- icu4c/source/i18n/messageformat2_parser.h is different
- icu4c/source/i18n/messageformat2_serializer.cpp is different
- icu4c/source/i18n/unicode/messageformat2_data_model.h is no longer changed in the branch
- icu4c/source/test/intltest/messageformat2test_read_json.cpp is no longer changed in the branch
- icu4j/main/core/src/main/java/com/ibm/icu/message2/MFDataModelFormatter.java is no longer changed in the branch
- icu4j/main/core/src/main/java/com/ibm/icu/message2/MFParser.java is different
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/CoreTest.java is different
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/DataModelErrorsTest.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/DefaultTestProperties.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/FirstReleaseTests.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/FunctionsTest.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/IcuFunctionsTest.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/MF2Test.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/Param.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/ParserSmokeTest.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/SelectorsWithVariousArgumentsTest.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/Sources.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/StringToListAdapter.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/SyntaxErrorsTest.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/TestUtils.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/Unit.java is no longer changed in the branch
- testdata/message2/alias-selector-annotations.json is no longer changed in the branch
- testdata/message2/duplicate-declarations.json is no longer changed in the branch
- testdata/message2/icu-parser-tests.json is no longer changed in the branch
- testdata/message2/icu-test-functions.json is no longer changed in the branch
- testdata/message2/icu-test-previous-release.json is no longer changed in the branch
- testdata/message2/icu-test-selectors.json is no longer changed in the branch
- testdata/message2/invalid-number-literals-diagnostics.json is no longer changed in the branch
- testdata/message2/invalid-options.json is no longer changed in the branch
- testdata/message2/markup.json is no longer changed in the branch
- testdata/message2/matches-whitespace.json is no longer changed in the branch
- testdata/message2/more-data-model-errors.json is no longer changed in the branch
- testdata/message2/more-functions.json is no longer changed in the branch
- testdata/message2/more-syntax-errors.json is no longer changed in the branch
- testdata/message2/README.txt is no longer changed in the branch
- testdata/message2/reserved-syntax.json is no longer changed in the branch
- testdata/message2/resolution-errors.json is no longer changed in the branch
- testdata/message2/runtime-errors.json is no longer changed in the branch
- testdata/message2/spec/data-model-errors.json is no longer changed in the branch
- testdata/message2/spec/functions/date.json is no longer changed in the branch
- testdata/message2/spec/functions/datetime.json is no longer changed in the branch
- testdata/message2/spec/functions/integer.json is no longer changed in the branch
- testdata/message2/spec/functions/number.json is no longer changed in the branch
- testdata/message2/spec/functions/string.json is no longer changed in the branch
- testdata/message2/spec/functions/time.json is no longer changed in the branch
- testdata/message2/spec/syntax-errors.json is no longer changed in the branch
- testdata/message2/spec/test-core.json is no longer changed in the branch
- testdata/message2/spec/test-functions.json is no longer changed in the branch
- testdata/message2/syntax-errors-diagnostics-multiline.json is no longer changed in the branch
- testdata/message2/syntax-errors-end-of-input.json is no longer changed in the branch
- testdata/message2/tricky-declarations.json is no longer changed in the branch
- testdata/message2/valid-tests.json is different
~ Your Friendly Jira-GitHub PR Checker Bot
I've rebased this PR, so it's now ready for review.
~~Note that this includes several fixes related to unsupported expression/statement (reserved) parsing. Recently these features were removed from the spec. However, since the recent spec changes haven't been integrated yet, I'd prefer to land these changes. I'm open to suggestions, though.~~ Changed my mind; since reserved syntax was removed from ICU4J already, I've removed reserved-related changes from this PR.
/cc @echeran
Notice: the branch changed across the force-push!
- icu4c/source/test/intltest/messageformat2test_read_json.cpp is now changed in the branch
- icu4j/main/core/src/main/java/com/ibm/icu/message2/MFParser.java is different
- icu4j/main/core/src/main/java/com/ibm/icu/message2/StringUtils.java is no longer changed in the branch
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/CoreTest.java is different
- testdata/message2/reserved-syntax-2.json is different
~ Your Friendly Jira-GitHub PR Checker Bot
Notice: the branch changed across the force-push!
- icu4c/source/i18n/messageformat2_parser.cpp is different
- icu4c/source/i18n/messageformat2_serializer.cpp is different
- icu4c/source/test/intltest/messageformat2test_read_json.cpp is different
- icu4j/main/core/src/main/java/com/ibm/icu/message2/InputSource.java is no longer changed in the branch
- icu4j/main/core/src/main/java/com/ibm/icu/message2/MFParser.java is different
- icu4j/main/core/src/test/java/com/ibm/icu/dev/test/message2/CoreTest.java is different
- testdata/message2/reserved-syntax-2.json is no longer changed in the branch
- testdata/message2/syntax-errors-reserved.json is now changed in the branch
- testdata/message2/valid-tests.json is different
~ Your Friendly Jira-GitHub PR Checker Bot
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot