elasticsearch-js icon indicating copy to clipboard operation
elasticsearch-js copied to clipboard

undici pipelining & idempotent

Open ronag opened this issue 2 years ago • 14 comments

If you want to take advantage of HTTP pipelining with undici you should probably set the idempotent: true option to the POST/PUT requests that are idempotent, e.g. /_bulk. Otherwise undici won't pipeline them.

Refs: https://undici.nodejs.org/#/docs/api/Dispatcher?id=parameter-dispatchoptions

ronag avatar Feb 18 '22 07:02 ronag

Example of how I would do a bulk helper.

const client = new Pool(url, {
  pipelining: 3,
})

setInterval(() => {
  flushTraces()
}, 30e3).unref()

const HEADERS = ['content-type', 'application/x-ndjson']
let traces = ''

function trace(obj) {
  const doc = JSON.stringify(obj).slice(1, -1)
  traces += `{ "create": { "_index": "myINdex" } }\n`
  traces += `{ "@timestamp": "${new Date().toISOString()}", ${doc} }\n`
  if (traces.length > 1024 * 1024) {
    flushTraces()
  }
}

async function flushTraces() {
  if (!traces) {
    return
  }

  try {
    const requestBody = traces
    traces = ''

    const { statusCode, body: responseBody } = await client.request({
      path: '/_bulk',
      method: 'POST',
      idempotent: true,
      headers: HEADERS,
      body: requestBody,
    })

    if (statusCode !== 200) {
      logger.error(
        { statusCode, body: await responseBody.json() },
        'trace failed'
      )
    } else {
      await responseBody.dump()
    }
  } catch (err) {
    logger.error({ err }, 'trace failed')
  }
}

ronag avatar Feb 18 '22 07:02 ronag

Also it seems that Elasticsearch does not provide a keep-alive timeout hint which is problematic. What is the default keep-alive timeout for esc?

ronag avatar Feb 18 '22 07:02 ronag

Hello! Thanks for the tip! I'll look into this. Sidenote, bulk is not idempotent, as is an API used to perform write operations. Many read APIs are using POST tho, we could improve there.

delvedor avatar Feb 21 '22 10:02 delvedor

bulk is not idempotent

What part is not idempotent?

ronag avatar Feb 21 '22 10:02 ronag

It depends on what you are doing. While delete and create operations are idempotent, index and update will at least update the document's _version field. While if you are sending an index operation without a document id, you will be always creating a new document.

delvedor avatar Feb 21 '22 10:02 delvedor

@delvedor Thanks! Maybe we could dynamically check for those and set idempotent accordingly?

ronag avatar Feb 21 '22 10:02 ronag

Also I would recommend using different undici dispatchers for pipelinable and non-pipelinable requests.

ronag avatar Feb 21 '22 10:02 ronag

I'm mostly familiar with couchdb and in our couch client we use one dispatcher per couch view.

ronag avatar Feb 21 '22 10:02 ronag

Thanks! Maybe we could dynamically check for those and set idempotent accordingly?

Currently, we are not tracking which operations are idempotent and which aren't, so we can only detect the HTTP method. But we can improve our specification to track this and make our code generation smarter.

Thanks for your suggestions!

delvedor avatar Feb 21 '22 11:02 delvedor

Cool! What about the keep-alive hint?

ronag avatar Feb 21 '22 11:02 ronag

Elasticsearch has no timeout or pipelining limit and sends no hints :)

delvedor avatar Feb 21 '22 11:02 delvedor

Elasticsearch has no timeout or pipelining limit and sends no hints :)

Then you might want to set keepAliveTimeout: 5 * 60e3 or something so that undici doesn't aggressively disconnect. The default is quite short to be safe.

ronag avatar Feb 21 '22 11:02 ronag

Another thing (since I'm on this topic). It is very important that the client fully consumes any and all response bodies. Might be worth to add to documentation if end-users are able to have streaming responses.

ronag avatar Feb 21 '22 11:02 ronag

It is very important that the client fully consumes any and all response bodies. Might be worth to add to documentation if end-users are able to have streaming responses.

By default, the client will always consume the body (you can find here how undici is being used), but advanced users can pass the asStream option which will give them back the raw stream from undici. You are right that we should mention the risks of doing so in the docs :)

delvedor avatar Feb 21 '22 12:02 delvedor