Performance bug in BulkHandlers
My application often uses bulk requests for indexing data in ES. According to the profiler, a surprising amount of CPU time is spent in BulkHandlers.buildBulkHttpBody (mkString). Initially I thought that memory allocation might be the cause, but the following code has a much simpler problem:
private[bulk] def buildBulkHttpBody(bulk: BulkRequest): String = {
val builder = StringBuilder.newBuilder
val rows: Iterator[String] = BulkBuilderFn(bulk)
rows.addString(builder, "", "\n", "")
builder.append("\n") // es seems to require a trailing new line as well
builder.mkString
}
Notice, how the StringBuilder is defined:
class StringBuilder(...) extends AbstractSeq[Char]
Calling mkString on builder creates a String by iterating on all individual characters. This is very slow. I think that the initial intention was to use toString or result.
I have a number of alternative solutions:
- Simply replace
mkStringbyresult(). - Replace the whole method by:
BulkBuilderFn(bulk).mkString("", "\n", "\n")
- Avoid the "big string" allocation completely by passing a collection of strings into HttpEntity (best, but more complicated).
For me mkString is an alias to toString, so not sure if that will help. But, replacing the whole method with BulkBuilderFn(bulk).mkString("", "\n", "\n") seems to work fine. Do you want to create a PR, or shall I?
Maybe you checked it on a different scala version. I am using 2.13.14, where:
def result() = underlying.toString
override def toString: String = result()
// mkString is inherited
Please go ahead with a PR, I am too busy today.
My PR is in release 8.14.0, so you can try it out.