commons-lang
commons-lang copied to clipboard
Optimize string split methods: 1. Use ThreadLocal to make reuse of th…
####Optimize string split methods:
- In current impl of split methods, a string array list is created every time the method called, this should increase gc burden, also, if a string contains too many segments, it should contains several array resizing during the method call. Use ThreadLocal to make reuse of ArrayList and avoid resizing, this trick can also be found on JDK: see java.math.BigDecimal#toString();
// Private class to build a string representation for BigDecimal object.
// "StringBuilderHelper" is constructed as a thread local variable so it is
// thread safe. The StringBuilder field acts as a buffer to hold the temporary
// representation of BigDecimal. The cmpCharArray holds all the characters for
// the compact representation of BigDecimal (except for '-' sign' if it is
// negative) if its intCompact field is not INFLATED. It is shared by all
// calls to toString() and its variants in that particular thread.
static class StringBuilderHelper {
final StringBuilder sb; // Placeholder for BigDecimal string
final char[] cmpCharArray; // character array to place the intCompact
StringBuilderHelper() {
sb = new StringBuilder();
// All non negative longs can be made to fit into 19 character array.
cmpCharArray = new char[19];
}
// Accessors.
StringBuilder getStringBuilder() {
sb.setLength(0);
return sb;
}
...
- From JetBrains Intellij Idea inspection:
There are two styles to convert a collection to an array: either using a pre-sized array (like c.toArray(new String[c.size()])) or using an empty array (like c.toArray(new String[0]).
In older Java versions using pre-sized array was recommended, as the reflection call which is necessary to create an array of proper size was quite slow. However since late updates of OpenJDK 6 this call was intrinsified, making the performance of the empty array version the same and sometimes even better, compared to the pre-sized version. Also passing pre-sized array is dangerous for a concurrent or synchronized collection as a data race is possible between the size and toArray call which may result in extra nulls at the end of the array, if the collection was concurrently shrunk during the operation.
This inspection allows to follow the uniform style: either using an empty array (which is recommended in modern Java) or using a pre-sized array (which might be faster in older Java versions or non-HotSpot based JVMs).
See more:https://shipilev.net/blog/2016/arrays-wisdom-ancients/
Coverage decreased (-0.2%) to 95.175% when pulling 856e9b3d5fac403f8a1d4f9cb6e279376b052f04 on YuyuZha0:master into f0534ba8547ab5a9c64c4cecf0fa98fefa7ccc1d on apache:master.
Thanks for the pull request @YuyuZha0 and for the explanation & links about the fix. I don't have time for a thorough review now, and found nothing out of ordinary after a cursory review.
While we try to write code that performs well, there is a balance between maintainability and performance. So it will be important to confirm that while this improves performance, it won't be too hard to maintain the code (e.g. a comment on the newly added code explaining why those variables are being created could be helpful to other developers) :+1:
Thanks for reviewing it @Stzx ! I will have to sit down with the IDE and read the code with calm to do a proper review, unless another dev beats me to it (no hard feelings :). But I will re-read your comments and take your comment into consideration (really helpful, thanks).
Anyway, thanks for your effort! @kinow @Stzx
On my Mac(Intel Core i5, 2.3GHz, openjdk version "11.0.2"), It brings about 17% of performance improvation:
Benchmark Mode Cnt Score Error Units
StringSplitBenchmark.testCommonsLang3Split avgt 25 928.749 ± 11.299 ns/op
StringSplitBenchmark.testFastSplitUtils avgt 25 771.481 ± 19.010 ns/op
Here is the test cases:
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 5)
@Measurement(iterations = 5, time = 5)
public class StringSplitBenchmark {
private static final char separator = '@';
@Benchmark
public String[] testCommonsLang3Split(StringSupplier stringSupplier) {
return StringUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
}
@Benchmark
public String[] testFastSplitUtils(StringSupplier stringSupplier) {
return FastSplitUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
}
@State(Scope.Thread)
public static class StringSupplier implements Supplier<String> {
private final String[] array;
private int index = 0;
public StringSupplier() {
List<String> list = new ArrayList<>(1000);
ThreadLocalRandom random = ThreadLocalRandom.current();
for (int i = 0; i < 1000; i++) {
String s = StringUtils.join(
random.ints(25).toArray(),
separator
);
list.add(s);
}
this.array = list.toArray(new String[0]);
}
@Override
public String get() {
if (index >= array.length)
index = 0;
return array[index++];
}
}
}
I think this may be helpful to you @kinow
@kinow Strongly opposed to accepting this PR: I don't have studied this in detail, but in my oppinion we'd loose thread safety here. Split methods, and similar simple stuff should always be thread safe, unless there is a very good reason.
I know your opinion, but i was just thinking about where we lose the thread safety..., you see, there is a ThreadLocal. @jochenw What I'm warring about is the memory impact...
@YuyuZha0
On my Mac(Intel Core i5, 2.3GHz, openjdk version "11.0.2"), It brings about 17% of performance improvation:
I believe there are other methods in Commons Lang that could be optimized by removing branches (e.g. of what I mean), or by using more specialized data structures.
While some users could welcome this optimization, this is not the main goal of Commons Lang (i.e. it is not to offer what is missing in the Java language with the best performance, but yes to provide what is missing in the Java language).
And adding it would increase the code readability and maintenance. We should be able to accommodate performance improvements, but I think this one would make a class already complicated to maintain, a bit harder to maintain.
@Stzx
I am worried that some problems will still be caused by using the clear method. such as
I think this does not apply here. The instance that was a List
would now be a SplitBuffer
, i.e.
final SplitBuffer buffer = SPLIT_BUFFER_THREAD_LOCAL.get().getBuffer();
https://github.com/apache/commons-lang/pull/443/files#diff-3f4c835875ddc8a211c0272e8be51394R7902-R7903
So buffer
doesn't need to respect the List
/ArrayList
contracts.
@jochenw
@kinow Strongly opposed to accepting this PR: I don't have studied this in detail, but in my oppinion we'd loose thread safety here. Split methods, and similar simple stuff should always be thread safe, unless there is a very good reason.
It appears to be that the solution proposed by @YuyuZha0 would be thread-safe, though I haven't really tried much of the code to confirm.
From what I understand, the suggestion here is to use a single data structure per thread (see previous comments about a ThreadLocal
), following the approach used by the JVM's BigDecimal
.
While it seems like it would indeed improve performance, I also feel like opposing the change but from a maintainability point of view. StringUtils
has well over 5K lines, and the more we keep adding to it, the harder it gets to maintain.
And there is the risk of missing some other use case and having issues in multi-threaded environments as you suggested - even though we have the ThreadLocal, I haven't checked other variables used, or if there is any other possibility of other issues.
@Stzx
I am worried that some problems will still be caused by using the clear method. such as
I think this does not apply here. The instance that was a
List
would now be aSplitBuffer
, i.e.final SplitBuffer buffer = SPLIT_BUFFER_THREAD_LOCAL.get().getBuffer();
https://github.com/apache/commons-lang/pull/443/files#diff-3f4c835875ddc8a211c0272e8be51394R7902-R7903
So
buffer
doesn't need to respect theList
/ArrayList
contracts.
@kinow
At the time, the code used in the list, and the subsequent commits were improved, so I was concerned at the time. It seems that many problems have been solved now.
At the time, the code used in the list, and the subsequent commits were improved, so I was concerned at the time. It seems that many problems have been solved now.
Ah, got it! Sorry, I lost track of the changes since my first comment. Thanks for clarifying.
@kinow Thanks for the carefully reviewing ! Nice weekend, isn't it? I will edit the code later follow your advice. Currently I was on the performance, I've tried more cases, here is the result:
Benchmark (arrayLen) Mode Cnt Score Error Units
StringSplitBenchmark.testCommonsLang3Split 10 avgt 25 499.547 ± 10.716 ns/op
StringSplitBenchmark.testCommonsLang3Split 30 avgt 25 1502.510 ± 16.956 ns/op
StringSplitBenchmark.testCommonsLang3Split 50 avgt 25 2467.303 ± 18.970 ns/op
StringSplitBenchmark.testFastSplitUtils 10 avgt 25 396.252 ± 4.653 ns/op
StringSplitBenchmark.testFastSplitUtils 30 avgt 25 1145.600 ± 5.604 ns/op
StringSplitBenchmark.testFastSplitUtils 50 avgt 25 1885.414 ± 4.121 ns/op
StringSplitBenchmark.testGuavaSplit 10 avgt 25 565.904 ± 5.483 ns/op
StringSplitBenchmark.testGuavaSplit 30 avgt 25 1665.049 ± 81.051 ns/op
StringSplitBenchmark.testGuavaSplit 50 avgt 25 2758.394 ± 7.684 ns/op
Cases is shown bellow:
import com.google.common.base.Splitter;
import org.apache.commons.lang3.StringUtils;
import org.openjdk.jmh.annotations.*;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;
/**
*
* @author zhaoyuyu
* @since 2019-08-21
**/
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 5)
@Measurement(iterations = 5, time = 5)
public class StringSplitBenchmark {
private static final char separator = '@';
private static final Splitter splitter = Splitter.on(separator);
@Benchmark
public String[] testCommonsLang3Split(StringSupplier stringSupplier) {
return StringUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
}
@Benchmark
public String[] testFastSplitUtils(StringSupplier stringSupplier) {
return FastSplitUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
}
@Benchmark
public String[] testGuavaSplit(StringSupplier supplier) {
return splitter.splitToList(supplier.get()).toArray(new String[0]);
}
@State(Scope.Thread)
public static class StringSupplier implements Supplier<String> {
@Param({"10", "30", "50"})
private int arrayLen;
private String[] array;
private int index = 0;
@Setup
public void setup() {
List<String> list = new ArrayList<>(1000);
ThreadLocalRandom random = ThreadLocalRandom.current();
for (int i = 0; i < 1000; i++) {
String s = StringUtils.join(
random.ints(arrayLen).toArray(),
separator
);
list.add(s);
}
this.array = list.toArray(new String[0]);
}
@Override
public String get() {
if (index >= array.length)
index = 0;
return array[index++];
}
}
}
The reason why I propose this optimization is that sometimes these methods are really under heavily usage(In my case, I use splitPreserveAllTokens for log processing, the method would be called billions of times every day). So for me, the performance is really important.
The StringUtils is widely used, any edition must be cautiously, so I fully understand you warring. In computer science, everything would be a trade off, it's really a hard choice.