elastic4s icon indicating copy to clipboard operation
elastic4s copied to clipboard

Performance bug in BulkHandlers

Open SheliakLyr opened this issue 1 year ago • 3 comments

My application often uses bulk requests for indexing data in ES. According to the profiler, a surprising amount of CPU time is spent in BulkHandlers.buildBulkHttpBody (mkString). Initially I thought that memory allocation might be the cause, but the following code has a much simpler problem:

private[bulk] def buildBulkHttpBody(bulk: BulkRequest): String = {
    val builder = StringBuilder.newBuilder
    val rows: Iterator[String] = BulkBuilderFn(bulk)
    rows.addString(builder, "", "\n", "")
    builder.append("\n") // es seems to require a trailing new line as well
    builder.mkString
  }

Notice, how the StringBuilder is defined:

class StringBuilder(...) extends AbstractSeq[Char]

Calling mkString on builder creates a String by iterating on all individual characters. This is very slow. I think that the initial intention was to use toString or result.

I have a number of alternative solutions:

  1. Simply replace mkString by result().
  2. Replace the whole method by:
BulkBuilderFn(bulk).mkString("", "\n", "\n")
  1. Avoid the "big string" allocation completely by passing a collection of strings into HttpEntity (best, but more complicated).

SheliakLyr avatar Jul 02 '24 11:07 SheliakLyr

For me mkString is an alias to toString, so not sure if that will help. But, replacing the whole method with BulkBuilderFn(bulk).mkString("", "\n", "\n") seems to work fine. Do you want to create a PR, or shall I?

Philippus avatar Jul 02 '24 11:07 Philippus

Maybe you checked it on a different scala version. I am using 2.13.14, where:

  def result() = underlying.toString

  override def toString: String = result()

  // mkString is inherited

Please go ahead with a PR, I am too busy today.

SheliakLyr avatar Jul 02 '24 12:07 SheliakLyr

My PR is in release 8.14.0, so you can try it out.

Philippus avatar Jul 16 '24 09:07 Philippus