icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-22511 collator hang in comparison

Open FrankYFTang opened this issue 8 months ago • 16 comments

Avoid infinity loop by terminating and returning error if we loop the same for too many times.

Checklist
  • [X] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22511
  • [X] Required: The PR title must be prefixed with a JIRA Issue number.
  • [X] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • [X] Required: Each commit message must be prefixed with a JIRA Issue number.
  • [x] Issue accepted (done by Technical Committee after discussion)
  • [X] Tests included, if applicable
  • [ ] API docs and/or User Guide docs changed or added, if applicable

FrankYFTang avatar Oct 17 '23 23:10 FrankYFTang

to test java run mvn verify -Dtest="com.ibm.icu.dev.test.collator.CollationTest" -Dsurefire.failIfNoSpecifiedTests=false -ff

(Googlers- make sure you did the /etc/group trick on your machine to make the build faster first)

FrankYFTang avatar Oct 18 '23 18:10 FrankYFTang

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/collationcompare.cpp is now changed in the branch
  • icu4c/source/test/intltest/collationtest.cpp is different
  • icu4j/main/collate/src/main/java/com/ibm/icu/impl/coll/CollationCompare.java is now changed in the branch
  • icu4j/main/collate/src/test/java/com/ibm/icu/dev/test/collator/CollationTest.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@markusicu friendly ping. this bug was filed Nov 2, 2023 by fuzzer and and the PR is ready for review for a week.

FrankYFTang avatar Nov 20 '23 18:11 FrankYFTang

I think both are out of office for Thanksgiving week.

macchiati avatar Nov 20 '23 23:11 macchiati

I took a quick look at the code, and I am concerned. It appears to return as equal whenever three times in a row, a primary is equal. But then we would get the clearly incorrect:

aaah = aaagh

Each pass through the main loop should be resetting either the right or left CEs, or both. Because getting a ce advances an internal pointer, that should always terminate. So something fishy is going on.

I suggest that you augment your printout to provide more readable information, since lines like the following are too hard to follow: ce from left 0xa4000500 ce from left 0x2a00000005009c00 ...

I suggest when you print, break down the left and right ce values in to primary, secondary, tertiary, and put tabs between the fields so that this can be loaded into a spreadsheet. Eg something like: ... Lp 2a0000 Ls 0500 Lt 0042 ...

I can't remember, but you make also be able to fetch a code point offset from the ce; if so it would help to include that in the printout.

macchiati avatar Nov 21 '23 00:11 macchiati

sorry, I assume you look at the Java code first. My java change is indeed wrong. I should throw there not return equal.

FrankYFTang avatar Nov 21 '23 00:11 FrankYFTang

Notice: the branch changed across the force-push!

  • icu4j/main/collate/src/main/java/com/ibm/icu/impl/coll/CollationCompare.java is different
  • icu4j/main/collate/src/test/java/com/ibm/icu/dev/test/collator/CollationTest.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Thanks. I'm still a bit worried about the change, because of the aaah case, so I'd appreciate a printout for the test case along the lines I suggested.

macchiati avatar Nov 21 '23 01:11 macchiati

I looked at this and didn't understand what it was doing or why. I think I'll just defer to Mark here.

richgillam avatar Nov 28 '23 23:11 richgillam

@markusicu ping

FrankYFTang avatar Dec 08 '23 22:12 FrankYFTang

ping @markusicu . this is more urgent than other PR because it is now able to hang v8 with very simple script (see the bug for details)

FrankYFTang avatar Dec 18 '23 18:12 FrankYFTang