icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-22794 MF2: De-duplicate C++ and Java test data

Open catamorphism opened this issue 1 year ago • 3 comments

This PR moves all the .json files for data-driven ICU4C and ICU4J tests into a single top-level directory, icu/testdata/.

I did my best to remove duplicate tests.

This involved fixing several ICU4J parser bugs, almost all involving whitespace handling.

Some tests still had to be ignored in either Java or C++, in which cases I filed ICU tickets.

Checklist
  • [x] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22794
  • [x] Required: The PR title must be prefixed with a JIRA Issue number.
  • [x] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • [x] Required: Each commit message must be prefixed with a JIRA Issue number.
  • [x] Issue accepted (done by Technical Committee after discussion)
  • [x] Tests included, if applicable
  • [x] API docs and/or User Guide docs changed or added, if applicable

catamorphism avatar Jun 26 '24 22:06 catamorphism

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

/cc @echeran @mihnita

catamorphism avatar Jun 27 '24 21:06 catamorphism

Maybe this was discussed before, but for example ICU tarballs in https://github.com/unicode-org/icu/releases/tag/release-75-1 do not have this format.

Perhaps we should have a situation where the testdata gets copied in the ICU4C and ICU4J tarballs at a high level? I.e. so source and testdata are siblings.

Then the code could check ../testdata (source tarball) and if that doesn't exist then check ../../testdata (in-repo) and then fail.

srl295 avatar Jun 28 '24 21:06 srl295

Maybe this was discussed before, but for example ICU tarballs in https://github.com/unicode-org/icu/releases/tag/release-75-1 do not have this format.

Perhaps we should have a situation where the testdata gets copied in the ICU4C and ICU4J tarballs at a high level? I.e. so source and testdata are siblings.

Then the code could check ../testdata (source tarball) and if that doesn't exist then check ../../testdata (in-repo) and then fail.

In ICU4C, 0ef54a5 should do this.

I'm not sure how to do the same for ICU4J, though. I got as far as finding the icu4j/releases_tools/github_release.sh, but I can't get it to run without errors, even in a branch with no local changes. I pasted the relevant part of the output into https://gist.github.com/catamorphism/1c9ea0e26580dd18ef69852de931ce8d . It's giving me an error about the javadocs. So I can't get far enough to make the change to copy the testdata/ directory into the jar file and test the change. Do you have any suggestions?

catamorphism avatar Jul 17 '24 22:07 catamorphism

I'm not sure how to do the same for ICU4J, though. I got as far as finding the icu4j/releases_tools/github_release.sh, but I can't get it to run without errors, even in a branch with no local changes. I pasted the relevant part of the output into https://gist.github.com/catamorphism/1c9ea0e26580dd18ef69852de931ce8d . It's giving me an error about the javadocs. So I can't get far enough to make the change to copy the testdata/ directory into the jar file and test the change. Do you have any suggestions?

I was able to get it to work by editing the github_release.sh script as follows:

# mvn clean install -DskipITs -DskipTests -P with_sources,with_javadoc
mvn clean install -DskipITs -DskipTests -P with_sources
mvn install -DskipITs -DskipTests -P with_javadoc

though I don't know if that breaks anything. However, I'm not sure how to run the tests in the source .jar file once it's been created.

catamorphism avatar Jul 17 '24 23:07 catamorphism

Looks like the build failures are timeouts.

catamorphism avatar Jul 25 '24 19:07 catamorphism

I'm not sure how to do the same for ICU4J, though. I got as far as finding the icu4j/releases_tools/github_release.sh, but I can't get it to run without errors, even in a branch with no local changes. I pasted the relevant part of the output into https://gist.github.com/catamorphism/1c9ea0e26580dd18ef69852de931ce8d . It's giving me an error about the javadocs.

that has to do with needing to use an older java to build.

So I can't get far enough to make the change to copy the testdata/ directory into the jar file and test the change. Do you have any suggestions?

I'm not sure, but I can also try to test/fix this in the PR https://github.com/unicode-org/icu/pull/3053

I was able to get it to work by editing the github_release.sh script as follows:

# mvn clean install -DskipITs -DskipTests -P with_sources,with_javadoc
mvn clean install -DskipITs -DskipTests -P with_sources
mvn install -DskipITs -DskipTests -P with_javadoc

though I don't know if that breaks anything. However, I'm not sure how to run the tests in the source .jar file once it's been created.

You have to unpack the jar and then compile everything.

srl295 avatar Jul 31 '24 21:07 srl295

@catamorphism OK, well all tests pass on ICU4C (verified they are loading the files from the common area) and on ICU4J they fail with this (which at least shows that the test data is being read from its location!)

[ERROR] Failures: 
[ERROR]   FromJsonTest.test:417 TestCase {
  message: {hello {(world)}}',
  locale: 'en-US',
  arguments: {},
  expected: 'hello world',
  ignore: false,
  ignoreReason: '',
  errors: []
}
No error was expected here, but it happened:
Parse error:
Message: <<{hello {(world)}}>>
Error: Parse error [8]: Unclosed placeholder
{hello {^^^(world)}}

[ERROR]   FunctionsTest.test:32 UnitTest {src="Default int64: {$val}!", params={val=1.2345678901234568E18}, exp="Default int64: 1.234.567.890.123.456.770!"} expected:<...234.567.890.123.456.[77]0!> but was:<...234.567.890.123.456.[80]0!>
[INFO] 
[ERROR] Tests run: 2833, Failures: 2, Errors: 0, Skipped: 26

srl295 avatar Jul 31 '24 23:07 srl295

though I don't know if that breaks anything. However, I'm not sure how to run the tests in the source .jar file once it's been created.

You have to unpack the jar and then compile everything.

Do any of the .jar files actually contain tests? I noticed that the github_release.sh script passes the flag -DskipTests when creating files, and after running the script, there are no test files in icu4j/target/github_release/icu4j-76_0_1-sources.jar.

catamorphism avatar Aug 01 '24 01:08 catamorphism

seems to be in the right direction, need to verify with ICU-TC whether sys/stat.h is OK for inclusion.

No need, I changed it to check for a text file.

catamorphism avatar Aug 01 '24 01:08 catamorphism

@catamorphism OK, well all tests pass on ICU4C (verified they are loading the files from the common area) and on ICU4J they fail with this (which at least shows that the test data is being read from its location!)

What command are you running to get that output? It's weird because the string {hello {(world)}} doesn't appear anywhere in either top-level icu4j or testdata in this branch, nor does it appear in icu4j in the main branch.

catamorphism avatar Aug 01 '24 02:08 catamorphism

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot