there can be only one UnicodeProperty.java
Background: ucd-dev email thread “Unicode Tools vs. UnicodeProperty.java”
We have two versions of org/unicode/cldr/util/props/UnicodeProperty.java, one in CLDR and one in the Unicode Tools. Until 1.5 years ago they were identical, but they have been diverging and are now no longer even interface-compatible. The CLDR version is the newer one.
For now, I am adding a note to the setup instructions to move cldr-code above unicodetools in the build path, but of course that's evil and brittle.
We had agreed to move UnicodeProperty from CLDR into the Unicode Tools, but other CLDR code depends on this class, and I am not sure if I have time to get through that. Maybe I just rename the unused Unicode Tools version at first.
There is also a third version of UnicodeProperty.java, in the UnicodeJsps code, but thankfully it is in a different package. Still, this is confusing and error-prone; we should either merge the classes or give them distinct names.
- https://github.com/unicode-org/cldr/blob/master/tools/cldr-code/src/main/java/org/unicode/cldr/util/props/UnicodeProperty.java
- https://github.com/unicode-org/unicodetools/blob/main/unicodetools/src/main/java/org/unicode/cldr/util/props/UnicodeProperty.java (location after Mavenization move)
- ~~https://github.com/unicode-org/unicodetools/blob/main/UnicodeJsps/src/main/java/org/unicode/jsp/UnicodeProperty.java~~ Removed in #649, 2024-01-20.
- See https://unicode-org.atlassian.net/browse/CLDR-9787 for the CLDR side of duplicate classes between ICU/CLDR (similar problem), and https://unicode-org.atlassian.net/browse/ICU-21757 for what may be a needed part of the solution
- ~Once #64 is done, we should change UnicodeJsps to be dependent on unicodetools. This will allow the classes to be present only once and shared between those two.~ (filed this as https://github.com/unicode-org/unicodetools/issues/131 )
I tried at one point to pull UnicodeProperty out from CLDR, but it is not simple. There are classes that depend on classes and so on, soit was too hard to figure out what the tangled set of dependencies were, and where I could cut the cord to allow me to extract a set of classes and tie off the loose ends.
Mark
On Sat, May 29, 2021 at 10:45 AM Markus Scherer @.***> wrote:
Background: ucd-dev email thread “Unicode Tools vs. UnicodeProperty.java https://groups.google.com/g/ucd-dev/c/H60xj1kTfCU”
We have two versions of org/unicode/cldr/util/props/UnicodeProperty.java, one in CLDR and one in the Unicode Tools. Until 1.5 years ago they were identical, but they have been diverging and are now no longer even interface-compatible. The CLDR version is the newer one.
For now, I am adding a note to the setup instructions to move cldr-code above unicodetools in the build path, but of course that's evil and brittle.
We had agreed to move UnicodeProperty from CLDR into the Unicode Tools, but other CLDR code depends on this class, and I am not sure if I have time to get through that. Maybe I just rename the unused Unicode Tools version at first.
There is also a third version of UnicodeProperty.java, in the UnicodeJsps code, but thankfully it is in a different package. Still, this is confusing and error-prone; we should either merge the classes or give them distinct names.
https://github.com/unicode-org/cldr/blob/master/tools/cldr-code/src/main/java/org/unicode/cldr/util/props/UnicodeProperty.java
https://github.com/unicode-org/unicodetools/blob/main/unicodetools/org/unicode/cldr/util/props/UnicodeProperty.java
https://github.com/unicode-org/unicodetools/blob/main/UnicodeJsps/src/main/java/org/unicode/jsp/UnicodeProperty.java
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/66, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMDLXLVTZAO7LDBIWP3TQER4DANCNFSM45YP3R3A .
Josh's analysis: https://docs.google.com/spreadsheets/d/1sAIFut5QL30EXwMK_ZM4yKzWJge2gatLA0yfl_zxtSY/edit?usp=sharing
As @srl295 posted, I have performed a basic analysis of duplicated .java files as a starting point. I attempted a very basic diff to try to triage a bit but I think that was not terribly useful with the multi-file overlaps. Sadly, looks like only one duplicated file is identical.
So I think the next step will be to look more closely into the overlaps and determine commonalities/differences inside and figure out what can be re-worked, removed, etc.
Now that https://github.com/unicode-org/unicodetools/pull/148 (#131) is merged, the JSPs can easily call into unicodetools librarycode.
this bit me trying to do #186 (IDNA)— because of the order of loading, ICU4J's UnicodeSet ends up calling into unicodetools' UnicodeProperty and crashes.
Hmmm. We could rename unicodetools' version
On Wed, Mar 23, 2022 at 11:55 AM Steven R. Loomis @.***> wrote:
this bit me trying to do #186 https://github.com/unicode-org/unicodetools/issues/186 (IDNA)— because of the order of loading, ICU4J's UnicodeSet ends up calling into unicodetools' UnicodeProperty and crashes.
— Reply to this email directly, view it on GitHub https://github.com/unicode-org/unicodetools/issues/66#issuecomment-1076705347, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJLEMBQ54LIJJAYF35ETIDVBNSIJANCNFSM45YP3R3A . You are receiving this because you commented.Message ID: @.***>
@macchiati I'm going to try moving unicodetools' version into org.unicode.props …
Back burner on this due to CLDR release, but just as an example of the layering problems:
https://github.com/unicode-org/unicodetools/blob/8d6bd2e50b541f68bb86ae4205213f7af71aa8bb/unicodetools/src/main/java/org/unicode/tools/TestSegments.java#L151-L155
private static void doCompare(UnicodeProperty.Factory factory, Segmenter rl, String line) {
RandomStringGenerator rsg;
RuleBasedBreakIterator icuBreak;
if (line.equals("compareGrapheme")) {
rsg = new RandomStringGenerator(factory, "GraphemeClusterBreak");
So here we are calling into CLDR (RandomStringGenerator) but passing it unicodetools' version of UnicodeProperty.Factory. The only way this “works” is basically that UnicodeProperty.class from CLDR gets replaced at runtime with unicodetools' … not good
It looks like CLDR RandomStringGenerator has other problems, too. It sounds generic, but its init() function is specific to working with the GCB property. And when you call the (Factory, String, bool, bool) constructor, you get some data from the input factory and other data from the ICUPropertyFactory.
If this is used from CLDR code, then I would clone it into Unicode Tools (with a different package and modified class name). If it's not used from CLDR code, then move it into Unicode Tools.
It might also be possible to pull all of the factory calls out into the call site and construct this generator only with UnicodeSet/UnicodeMap/String/boolean values.
Observations (not complaints):
- it's very clear that our development tools / environment / process has had quite an impact on how things are structured. Copying source files was easy, dependencies and reuse between projects hard/impossible, even between the unicodetools and the JSPs
- I think CLDR should be split up into modules. Maybe it should depend on unicodetools and not the other way around!
I verified that the RandomStringGenerator is not used in CLDR. Please move it out of there and into the Unicode Tools. https://unicode-org.atlassian.net/browse/CLDR-15482
@markusicu my #214 does copy it into unicodetools paving the way for deletion on cldr.
It's "now" illegal (as of JDK 9?) to have more than one module export the same package. So, CLDR and ICU aren't supposed to both export to com.ibm.icu.text.
Something I learned (and others might have realized some time ago):
The Unicode Tools use the CLDR UnicodePropertySymbolTable which calls into UnicodeProperty. That must be a CLDR UnicodeProperty, which is what the symbol table code calls from within the CLDR jar. The Tools provide the symbol table with a ToolUnicodeProperty (which lives in the Tools) which extends UnicodeProperty. Bad things happen when CLDR code uses the Unicode Tools version of UnicodeProperty.
Unicode Tools code needs the Unicode Tools version of UnicodeProperty.
David changed CLDR UnicodeProperty to base its Matcher classes on a Java Predicate: https://unicode-org.atlassian.net/browse/CLDR-13750 https://github.com/unicode-org/cldr/pull/460
The Unicode Tools version is still basing the Matchers on an older ObjectMatcher interface.
FYI In pull request #214 @srl295 copied most of the classes from https://github.com/unicode-org/cldr/tree/main/tools/cldr-code/src/main/java/org/unicode/cldr/util/props into the unicodetools. It looks like only abstract class UnicodeLabel (the base for all of the UnicodeProperty classes) still exists only in CLDR.
meld ~/cldr/uni/src/tools/cldr-code/src/main/java/org/unicode/cldr/util/props/UnicodeProperty.java unicodetools/src/main/java/org/unicode/props/UnicodeProperty.java
@eggrobin just deleted the UnicodeJSPs version in PR #649.