ICU-22794 MF2: De-duplicate C++ and Java test data
This PR moves all the .json files for data-driven ICU4C and ICU4J tests into a single top-level directory, icu/testdata/.
I did my best to remove duplicate tests.
This involved fixing several ICU4J parser bugs, almost all involving whitespace handling.
Some tests still had to be ignored in either Java or C++, in which cases I filed ICU tickets.
Checklist
- [x] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22794
- [x] Required: The PR title must be prefixed with a JIRA Issue number.
- [x] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
- [x] Required: Each commit message must be prefixed with a JIRA Issue number.
- [x] Issue accepted (done by Technical Committee after discussion)
- [x] Tests included, if applicable
- [x] API docs and/or User Guide docs changed or added, if applicable
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot
/cc @echeran @mihnita
Maybe this was discussed before, but for example ICU tarballs in https://github.com/unicode-org/icu/releases/tag/release-75-1 do not have this format.
Perhaps we should have a situation where the testdata gets copied in the ICU4C and ICU4J tarballs at a high level? I.e. so source and testdata are siblings.
Then the code could check ../testdata (source tarball) and if that doesn't exist then check ../../testdata (in-repo) and then fail.
Maybe this was discussed before, but for example ICU tarballs in https://github.com/unicode-org/icu/releases/tag/release-75-1 do not have this format.
Perhaps we should have a situation where the
testdatagets copied in the ICU4C and ICU4J tarballs at a high level? I.e. sosourceandtestdataare siblings.Then the code could check
../testdata(source tarball) and if that doesn't exist then check../../testdata(in-repo) and then fail.
In ICU4C, 0ef54a5 should do this.
I'm not sure how to do the same for ICU4J, though. I got as far as finding the icu4j/releases_tools/github_release.sh, but I can't get it to run without errors, even in a branch with no local changes. I pasted the relevant part of the output into https://gist.github.com/catamorphism/1c9ea0e26580dd18ef69852de931ce8d . It's giving me an error about the javadocs. So I can't get far enough to make the change to copy the testdata/ directory into the jar file and test the change. Do you have any suggestions?
I'm not sure how to do the same for ICU4J, though. I got as far as finding the
icu4j/releases_tools/github_release.sh, but I can't get it to run without errors, even in a branch with no local changes. I pasted the relevant part of the output into https://gist.github.com/catamorphism/1c9ea0e26580dd18ef69852de931ce8d . It's giving me an error about the javadocs. So I can't get far enough to make the change to copy thetestdata/directory into the jar file and test the change. Do you have any suggestions?
I was able to get it to work by editing the github_release.sh script as follows:
# mvn clean install -DskipITs -DskipTests -P with_sources,with_javadoc
mvn clean install -DskipITs -DskipTests -P with_sources
mvn install -DskipITs -DskipTests -P with_javadoc
though I don't know if that breaks anything. However, I'm not sure how to run the tests in the source .jar file once it's been created.
Looks like the build failures are timeouts.
I'm not sure how to do the same for ICU4J, though. I got as far as finding the
icu4j/releases_tools/github_release.sh, but I can't get it to run without errors, even in a branch with no local changes. I pasted the relevant part of the output into https://gist.github.com/catamorphism/1c9ea0e26580dd18ef69852de931ce8d . It's giving me an error about the javadocs.
that has to do with needing to use an older java to build.
So I can't get far enough to make the change to copy the
testdata/directory into the jar file and test the change. Do you have any suggestions?
I'm not sure, but I can also try to test/fix this in the PR https://github.com/unicode-org/icu/pull/3053
I was able to get it to work by editing the
github_release.shscript as follows:# mvn clean install -DskipITs -DskipTests -P with_sources,with_javadoc mvn clean install -DskipITs -DskipTests -P with_sources mvn install -DskipITs -DskipTests -P with_javadocthough I don't know if that breaks anything. However, I'm not sure how to run the tests in the source .jar file once it's been created.
You have to unpack the jar and then compile everything.
@catamorphism OK, well all tests pass on ICU4C (verified they are loading the files from the common area) and on ICU4J they fail with this (which at least shows that the test data is being read from its location!)
[ERROR] Failures:
[ERROR] FromJsonTest.test:417 TestCase {
message: {hello {(world)}}',
locale: 'en-US',
arguments: {},
expected: 'hello world',
ignore: false,
ignoreReason: '',
errors: []
}
No error was expected here, but it happened:
Parse error:
Message: <<{hello {(world)}}>>
Error: Parse error [8]: Unclosed placeholder
{hello {^^^(world)}}
[ERROR] FunctionsTest.test:32 UnitTest {src="Default int64: {$val}!", params={val=1.2345678901234568E18}, exp="Default int64: 1.234.567.890.123.456.770!"} expected:<...234.567.890.123.456.[77]0!> but was:<...234.567.890.123.456.[80]0!>
[INFO]
[ERROR] Tests run: 2833, Failures: 2, Errors: 0, Skipped: 26
though I don't know if that breaks anything. However, I'm not sure how to run the tests in the source .jar file once it's been created.
You have to unpack the jar and then compile everything.
Do any of the .jar files actually contain tests? I noticed that the github_release.sh script passes the flag -DskipTests when creating files, and after running the script, there are no test files in icu4j/target/github_release/icu4j-76_0_1-sources.jar.
seems to be in the right direction, need to verify with ICU-TC whether sys/stat.h is OK for inclusion.
No need, I changed it to check for a text file.
@catamorphism OK, well all tests pass on ICU4C (verified they are loading the files from the common area) and on ICU4J they fail with this (which at least shows that the test data is being read from its location!)
What command are you running to get that output? It's weird because the string {hello {(world)}} doesn't appear anywhere in either top-level icu4j or testdata in this branch, nor does it appear in icu4j in the main branch.
Hooray! The files in the branch are the same across the force-push. 😃
~ Your Friendly Jira-GitHub PR Checker Bot