commons-lang icon indicating copy to clipboard operation
commons-lang copied to clipboard

Optimize string split methods: 1. Use ThreadLocal to make reuse of th…

Open YuyuZha0 opened this issue 5 years ago • 13 comments

####Optimize string split methods:

  • In current impl of split methods, a string array list is created every time the method called, this should increase gc burden, also, if a string contains too many segments, it should contains several array resizing during the method call. Use ThreadLocal to make reuse of ArrayList and avoid resizing, this trick can also be found on JDK: see java.math.BigDecimal#toString();

// Private class to build a string representation for BigDecimal object.
    // "StringBuilderHelper" is constructed as a thread local variable so it is
    // thread safe. The StringBuilder field acts as a buffer to hold the temporary
    // representation of BigDecimal. The cmpCharArray holds all the characters for
    // the compact representation of BigDecimal (except for '-' sign' if it is
    // negative) if its intCompact field is not INFLATED. It is shared by all
    // calls to toString() and its variants in that particular thread.
    static class StringBuilderHelper {
        final StringBuilder sb;    // Placeholder for BigDecimal string
        final char[] cmpCharArray; // character array to place the intCompact

        StringBuilderHelper() {
            sb = new StringBuilder();
            // All non negative longs can be made to fit into 19 character array.
            cmpCharArray = new char[19];
        }

        // Accessors.
        StringBuilder getStringBuilder() {
            sb.setLength(0);
            return sb;
        }
       ...

  • From JetBrains Intellij Idea inspection:

There are two styles to convert a collection to an array: either using a pre-sized array (like c.toArray(new String[c.size()])) or using an empty array (like c.toArray(new String[0]).

In older Java versions using pre-sized array was recommended, as the reflection call which is necessary to create an array of proper size was quite slow. However since late updates of OpenJDK 6 this call was intrinsified, making the performance of the empty array version the same and sometimes even better, compared to the pre-sized version. Also passing pre-sized array is dangerous for a concurrent or synchronized collection as a data race is possible between the size and toArray call which may result in extra nulls at the end of the array, if the collection was concurrently shrunk during the operation.

This inspection allows to follow the uniform style: either using an empty array (which is recommended in modern Java) or using a pre-sized array (which might be faster in older Java versions or non-HotSpot based JVMs).

See more:https://shipilev.net/blog/2016/arrays-wisdom-ancients/

YuyuZha0 avatar Aug 21 '19 03:08 YuyuZha0

Coverage Status

Coverage decreased (-0.2%) to 95.175% when pulling 856e9b3d5fac403f8a1d4f9cb6e279376b052f04 on YuyuZha0:master into f0534ba8547ab5a9c64c4cecf0fa98fefa7ccc1d on apache:master.

coveralls avatar Aug 21 '19 05:08 coveralls

Thanks for the pull request @YuyuZha0 and for the explanation & links about the fix. I don't have time for a thorough review now, and found nothing out of ordinary after a cursory review.

While we try to write code that performs well, there is a balance between maintainability and performance. So it will be important to confirm that while this improves performance, it won't be too hard to maintain the code (e.g. a comment on the newly added code explaining why those variables are being created could be helpful to other developers) :+1:

kinow avatar Aug 21 '19 06:08 kinow

Thanks for reviewing it @Stzx ! I will have to sit down with the IDE and read the code with calm to do a proper review, unless another dev beats me to it (no hard feelings :). But I will re-read your comments and take your comment into consideration (really helpful, thanks).

kinow avatar Aug 21 '19 06:08 kinow

Anyway, thanks for your effort! @kinow @Stzx

YuyuZha0 avatar Aug 21 '19 06:08 YuyuZha0

On my Mac(Intel Core i5, 2.3GHz, openjdk version "11.0.2"), It brings about 17% of performance improvation:

Benchmark                                   Mode  Cnt    Score    Error  Units
StringSplitBenchmark.testCommonsLang3Split  avgt   25  928.749 ± 11.299  ns/op
StringSplitBenchmark.testFastSplitUtils     avgt   25  771.481 ± 19.010  ns/op

Here is the test cases:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 5)
@Measurement(iterations = 5, time = 5)
public class StringSplitBenchmark {

    private static final char separator = '@';

    @Benchmark
    public String[] testCommonsLang3Split(StringSupplier stringSupplier) {
        return StringUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
    }

    @Benchmark
    public String[] testFastSplitUtils(StringSupplier stringSupplier) {
        return FastSplitUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
    }


    @State(Scope.Thread)
    public static class StringSupplier implements Supplier<String> {

        private final String[] array;
        private int index = 0;

        public StringSupplier() {

            List<String> list = new ArrayList<>(1000);
            ThreadLocalRandom random = ThreadLocalRandom.current();
            for (int i = 0; i < 1000; i++) {
                String s = StringUtils.join(
                        random.ints(25).toArray(),
                        separator
                );
                list.add(s);
            }
            this.array = list.toArray(new String[0]);
        }

        @Override
        public String get() {
            if (index >= array.length)
                index = 0;
            return array[index++];
        }
    }
}

I think this may be helpful to you @kinow

YuyuZha0 avatar Aug 21 '19 07:08 YuyuZha0

@kinow Strongly opposed to accepting this PR: I don't have studied this in detail, but in my oppinion we'd loose thread safety here. Split methods, and similar simple stuff should always be thread safe, unless there is a very good reason.

jochenw avatar Aug 22 '19 07:08 jochenw

I know your opinion, but i was just thinking about where we lose the thread safety..., you see, there is a ThreadLocal. @jochenw What I'm warring about is the memory impact...

YuyuZha0 avatar Aug 22 '19 12:08 YuyuZha0

@YuyuZha0

On my Mac(Intel Core i5, 2.3GHz, openjdk version "11.0.2"), It brings about 17% of performance improvation:

I believe there are other methods in Commons Lang that could be optimized by removing branches (e.g. of what I mean), or by using more specialized data structures.

While some users could welcome this optimization, this is not the main goal of Commons Lang (i.e. it is not to offer what is missing in the Java language with the best performance, but yes to provide what is missing in the Java language).

And adding it would increase the code readability and maintenance. We should be able to accommodate performance improvements, but I think this one would make a class already complicated to maintain, a bit harder to maintain.

kinow avatar Aug 25 '19 01:08 kinow

@Stzx

I am worried that some problems will still be caused by using the clear method. such as

I think this does not apply here. The instance that was a List would now be a SplitBuffer, i.e.

final SplitBuffer buffer = SPLIT_BUFFER_THREAD_LOCAL.get().getBuffer();

https://github.com/apache/commons-lang/pull/443/files#diff-3f4c835875ddc8a211c0272e8be51394R7902-R7903

So buffer doesn't need to respect the List/ArrayList contracts.

kinow avatar Aug 25 '19 01:08 kinow

@jochenw

@kinow Strongly opposed to accepting this PR: I don't have studied this in detail, but in my oppinion we'd loose thread safety here. Split methods, and similar simple stuff should always be thread safe, unless there is a very good reason.

It appears to be that the solution proposed by @YuyuZha0 would be thread-safe, though I haven't really tried much of the code to confirm.

From what I understand, the suggestion here is to use a single data structure per thread (see previous comments about a ThreadLocal), following the approach used by the JVM's BigDecimal.

While it seems like it would indeed improve performance, I also feel like opposing the change but from a maintainability point of view. StringUtils has well over 5K lines, and the more we keep adding to it, the harder it gets to maintain.

And there is the risk of missing some other use case and having issues in multi-threaded environments as you suggested - even though we have the ThreadLocal, I haven't checked other variables used, or if there is any other possibility of other issues.

kinow avatar Aug 25 '19 01:08 kinow

@Stzx

I am worried that some problems will still be caused by using the clear method. such as

I think this does not apply here. The instance that was a List would now be a SplitBuffer, i.e.

final SplitBuffer buffer = SPLIT_BUFFER_THREAD_LOCAL.get().getBuffer();

https://github.com/apache/commons-lang/pull/443/files#diff-3f4c835875ddc8a211c0272e8be51394R7902-R7903

So buffer doesn't need to respect the List/ArrayList contracts.

@kinow

At the time, the code used in the list, and the subsequent commits were improved, so I was concerned at the time. It seems that many problems have been solved now.

Stzx avatar Aug 25 '19 02:08 Stzx

At the time, the code used in the list, and the subsequent commits were improved, so I was concerned at the time. It seems that many problems have been solved now.

Ah, got it! Sorry, I lost track of the changes since my first comment. Thanks for clarifying.

kinow avatar Aug 25 '19 02:08 kinow

@kinow Thanks for the carefully reviewing ! Nice weekend, isn't it? I will edit the code later follow your advice. Currently I was on the performance, I've tried more cases, here is the result:

Benchmark                                   (arrayLen)  Mode  Cnt     Score    Error  Units
StringSplitBenchmark.testCommonsLang3Split          10  avgt   25   499.547 ± 10.716  ns/op
StringSplitBenchmark.testCommonsLang3Split          30  avgt   25  1502.510 ± 16.956  ns/op
StringSplitBenchmark.testCommonsLang3Split          50  avgt   25  2467.303 ± 18.970  ns/op
StringSplitBenchmark.testFastSplitUtils             10  avgt   25   396.252 ±  4.653  ns/op
StringSplitBenchmark.testFastSplitUtils             30  avgt   25  1145.600 ±  5.604  ns/op
StringSplitBenchmark.testFastSplitUtils             50  avgt   25  1885.414 ±  4.121  ns/op
StringSplitBenchmark.testGuavaSplit                 10  avgt   25   565.904 ±  5.483  ns/op
StringSplitBenchmark.testGuavaSplit                 30  avgt   25  1665.049 ± 81.051  ns/op
StringSplitBenchmark.testGuavaSplit                 50  avgt   25  2758.394 ±  7.684  ns/op

Cases is shown bellow:

import com.google.common.base.Splitter;
import org.apache.commons.lang3.StringUtils;
import org.openjdk.jmh.annotations.*;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.function.Supplier;

/**
*
 * @author zhaoyuyu
 * @since 2019-08-21
 **/
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 5, time = 5)
@Measurement(iterations = 5, time = 5)
public class StringSplitBenchmark {

    private static final char separator = '@';
    private static final Splitter splitter = Splitter.on(separator);


    @Benchmark
    public String[] testCommonsLang3Split(StringSupplier stringSupplier) {
        return StringUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
    }

    @Benchmark
    public String[] testFastSplitUtils(StringSupplier stringSupplier) {
        return FastSplitUtils.splitPreserveAllTokens(stringSupplier.get(), separator);
    }

    @Benchmark
    public String[] testGuavaSplit(StringSupplier supplier) {
        return splitter.splitToList(supplier.get()).toArray(new String[0]);
    }


    @State(Scope.Thread)
    public static class StringSupplier implements Supplier<String> {

        @Param({"10", "30", "50"})
        private int arrayLen;

        private String[] array;
        private int index = 0;

        @Setup
        public void setup() {

            List<String> list = new ArrayList<>(1000);
            ThreadLocalRandom random = ThreadLocalRandom.current();
            for (int i = 0; i < 1000; i++) {
                String s = StringUtils.join(
                        random.ints(arrayLen).toArray(),
                        separator
                );
                list.add(s);
            }
            this.array = list.toArray(new String[0]);
        }

        @Override
        public String get() {
            if (index >= array.length)
                index = 0;
            return array[index++];
        }
    }
}

The reason why I propose this optimization is that sometimes these methods are really under heavily usage(In my case, I use splitPreserveAllTokens for log processing, the method would be called billions of times every day). So for me, the performance is really important.

The StringUtils is widely used, any edition must be cautiously, so I fully understand you warring. In computer science, everything would be a trade off, it's really a hard choice.

YuyuZha0 avatar Aug 25 '19 03:08 YuyuZha0