icu icon indicating copy to clipboard operation
icu copied to clipboard

ICU-23061 LowercaseTransliterator has poor scalability due to synchronized state

Open stevenschlansker opened this issue 9 months ago • 20 comments

LowercaseTransliterator tries to reuse buffers to avoid allocation. Historically, this may have been an important optimization. However on modern machines you are much more likely to use multiple CPUs, and modern JVMs are very efficient at GC, especially for local temporary data. Contending on a monitor is a big deal nowadays, especially for something so low level and hidden.

I've prepared a benchmark using JMH to show the change in scalability:

Linux 6.12.13-200.fc41.x86_64 AMD Ryzen 7 3700X 8-Core Processor
JMH version: 1.37
JDK 23.0.2, OpenJDK 64-Bit Server VM, 23.0.2+7

Original `synchronized`:

1 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   789.169 ±  2.779  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  5217.250 ± 35.501  ops/ms

8 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   759.934 ±  4.880  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  3916.430 ± 70.545  ops/ms

New `stateless`:

1 Thread Benchmark                         Mode  Cnt     Score    Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   779.504 ±  6.507  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  5641.104 ± 29.747  ops/ms

8 Thread Benchmark                         Mode  Cnt      Score      Error   Units
LowercaseTransliteratorPerf.testSentence  thrpt    5   6340.495 ±  240.436  ops/ms
LowercaseTransliteratorPerf.testShort     thrpt    5  39132.015 ± 7102.311  ops/ms

The current implementation exhibits negative scalability: adding more threads hurts throughput (~5% loss going from 1T to 8T) By removing the synchronized state, the implementation now scales perfectly (~8x increase going from 1T to 8T for the longer test case) There is a very minor (~1%) penalty for pure single-threaded throughput since we no longer reuse state. I believe this is acceptable in the context of allowing multiple threads to work without coordination.

Checklist

  • [x] Required: Issue filed: ICU-23061
  • [x] Required: The PR title must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • [x] Required: Each commit message must be prefixed with a JIRA Issue number. Example: "ICU-1234 Fix xyz"
  • [x] Issue accepted (done by Technical Committee after discussion)
  • [x] Tests included, if applicable
  • [x] API docs and/or User Guide docs changed or added, if applicable

stevenschlansker avatar Feb 25 '25 18:02 stevenschlansker

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/README.txt is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

I needed to add the license header to the performance test.

stevenschlansker avatar Feb 27 '25 21:02 stevenschlansker

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/pom.xml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

I removed the : from my commit messages, I think it was tripping up the checker. Also rebased on top of main.

stevenschlansker avatar Mar 17 '25 23:03 stevenschlansker

The change itself looks reasonable, assuming the performance advantages are as billed. I can't speak to the POM changes or the performance test, as that's not stuff I'm really familiar with.

Usually, bug fixes in ICU need to be done in both C++ and Java when appropriate. Do we have a similar problem on the C++ side? If so, I'd like to see a fix for that here too.

richgillam avatar Mar 18 '25 20:03 richgillam

These changes remind me of something. I think this Java lowercase transliterator still uses the locale. The ICU4C implementation removed that long ago because of how the registry works with sharing the singleton objects. Though the implementation in Java hard codes it to be US English casing only. So it's not much different.

I thought the transliterator casing functionality in ICU4C was doing something funny with thread safety long ago, but I can't seem to find the concerning code anymore. Perhaps it was removed, or I missed it.

grhoten avatar Mar 21 '25 05:03 grhoten

I can't speak to the POM changes or the performance test, as that's not stuff I'm really familiar with.

The introduction of JMH is optional, and if the project maintainers don't want to introduce it, I can remove these changes. I see that ICU already has its own benchmark harness. JMH is nice because it handles a lot of the boilerplate for you - help messages, warming up the JVM, timing runs - in a way that the "JVM experts" have thought long and hard about. The benchmark could be converted to the ICU runner, or not committed, if that's preferred. But I think you'll find that more and more outside contributors would already come in with JMH familiarity.

stevenschlansker avatar Mar 21 '25 16:03 stevenschlansker

We really need one or more people from the ICU4J side of things to comment here...

richgillam avatar Mar 25 '25 21:03 richgillam

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/pom.xml is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

I rebased and resolved conflicts.

stevenschlansker avatar Mar 28 '25 00:03 stevenschlansker

Anything I should do to move this PR forward? If the JMH benchmark is acceptable, I can squash the commits together. I didn't want to do that until we confirmed it would be accepted.

stevenschlansker avatar May 20 '25 15:05 stevenschlansker

Notice: the branch changed across the force-push!

  • icu4j/main/translit/src/main/java/com/ibm/icu/text/BreakTransliterator.java is now changed in the branch
  • icu4j/main/translit/src/main/java/com/ibm/icu/text/CaseFoldTransliterator.java is now changed in the branch
  • icu4j/main/translit/src/main/java/com/ibm/icu/text/TitlecaseTransliterator.java is now changed in the branch
  • icu4j/main/translit/src/main/java/com/ibm/icu/text/UppercaseTransliterator.java is now changed in the branch
  • icu4j/perf-tests/pom.xml is different
  • icu4j/perf-tests/README.txt is different
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/CaseFoldTransliteratorPerfTest.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is now changed in the branch
  • icu4j/pom.xml is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/CaseFoldTransliteratorPerfTest.java is different
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is different
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is different
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerf.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

I re-ran the benchmarks with the latest changes.

synchronized, 1T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   419.108 ±  26.005  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  4022.897 ± 124.318  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   802.483 ±  29.266  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  5383.847 ± 277.556  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   820.756 ±  54.370  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  5015.698 ± 216.265  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   820.914 ±  54.945  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  4716.300 ± 276.341  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   706.661 ±  29.485  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  4634.731 ± 368.809  ops/ms

synchronized, 8T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   594.238 ±  15.677  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  3328.109 ±  58.666  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   797.488 ±  19.831  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  4302.125 ±  36.354  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   801.179 ±  21.477  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  4366.354 ±  21.683  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   802.491 ±   5.279  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  3835.846 ± 205.297  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   598.344 ±  16.620  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  3612.776 ±  10.186  ops/ms

local-alloc, 1T:

Benchmark                                     Mode  Cnt     Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   378.975 ±   8.528  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5  2176.368 ±  66.654  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   817.171 ±  40.881  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  5887.798 ± 463.620  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   847.376 ±  53.358  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  5730.719 ± 377.115  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   835.202 ±  55.587  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  5244.637 ± 246.148  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   617.542 ±  43.544  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  5048.793 ± 404.342  ops/ms

local-alloc, 8T:

Benchmark                                     Mode  Cnt      Score     Error   Units
BreakTransliteratorPerf.testSentence         thrpt    5   2805.141 ±  33.812  ops/ms
BreakTransliteratorPerf.testShort            thrpt    5   4805.065 ± 171.575  ops/ms
CaseFoldTransliteratorPerfTest.testSentence  thrpt    5   6810.324 ±  56.592  ops/ms
CaseFoldTransliteratorPerfTest.testShort     thrpt    5  44420.295 ± 890.433  ops/ms
LowercaseTransliteratorPerf.testSentence     thrpt    5   6655.300 ±  20.258  ops/ms
LowercaseTransliteratorPerf.testShort        thrpt    5  42945.707 ± 116.837  ops/ms
TitlecaseTransliteratorPerf.testSentence     thrpt    5   5288.342 ±  97.473  ops/ms
TitlecaseTransliteratorPerf.testShort        thrpt    5  32089.013 ± 633.706  ops/ms
UppercaseTransliteratorPerf.testSentence     thrpt    5   4585.569 ±  66.147  ops/ms
UppercaseTransliteratorPerf.testShort        thrpt    5  36655.642 ± 421.983  ops/ms

stevenschlansker avatar May 31 '25 00:05 stevenschlansker

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerf.java is no longer changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/BreakTransliteratorPerfTest.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerf.java is no longer changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/LowercaseTransliteratorPerfTest.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerf.java is no longer changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/TitlecaseTransliteratorPerfTest.java is now changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerf.java is no longer changed in the branch
  • icu4j/perf-tests/src/main/java/com/ibm/icu/dev/test/perf/UppercaseTransliteratorPerfTest.java is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

I made the suggested renamings. Please let me know what you would like to do with the BreakTransliterator non-public access case.

stevenschlansker avatar Jun 06 '25 16:06 stevenschlansker

Notice: the branch changed across the force-push!

  • icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

It looks like all of the issues have been addressed on this PR and we have been ready to merge, we just overlooked it. I did some minor bookkeeping details to push it over the finish line and finally merge it.

FYI on bookkeeping details needed before the merge: CI needed to be manually enabled again (not sure why, once has been good enough for new contributors). The CI copyright check job failed because we're adding a new file in this PR, and it didn't have that verbiage, so I amended the PR with the diff below to add it, and then did a force push back to the PR branch (and then reapproved and merged).

diff --git a/icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java b/icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java
index 5afbe1e92bd..a80b5da2fa7 100644
--- a/icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java
+++ b/icu4j/perf-tests/src/main/java/com/ibm/icu/text/BreakTransliteratorAccess.java
@@ -1,3 +1,5 @@
+// © 2025 and later: Unicode, Inc. and others.
+// License & terms of use: http://www.unicode.org/copyright.html
 package com.ibm.icu.text;
 
 public class BreakTransliteratorAccess {

echeran avatar Jun 26 '25 21:06 echeran