elasticsearch-js
elasticsearch-js copied to clipboard
undici pipelining & idempotent
If you want to take advantage of HTTP pipelining with undici you should probably set the idempotent: true
option to the POST
/PUT
requests that are idempotent, e.g. /_bulk
. Otherwise undici won't pipeline them.
Refs: https://undici.nodejs.org/#/docs/api/Dispatcher?id=parameter-dispatchoptions
Example of how I would do a bulk helper.
const client = new Pool(url, {
pipelining: 3,
})
setInterval(() => {
flushTraces()
}, 30e3).unref()
const HEADERS = ['content-type', 'application/x-ndjson']
let traces = ''
function trace(obj) {
const doc = JSON.stringify(obj).slice(1, -1)
traces += `{ "create": { "_index": "myINdex" } }\n`
traces += `{ "@timestamp": "${new Date().toISOString()}", ${doc} }\n`
if (traces.length > 1024 * 1024) {
flushTraces()
}
}
async function flushTraces() {
if (!traces) {
return
}
try {
const requestBody = traces
traces = ''
const { statusCode, body: responseBody } = await client.request({
path: '/_bulk',
method: 'POST',
idempotent: true,
headers: HEADERS,
body: requestBody,
})
if (statusCode !== 200) {
logger.error(
{ statusCode, body: await responseBody.json() },
'trace failed'
)
} else {
await responseBody.dump()
}
} catch (err) {
logger.error({ err }, 'trace failed')
}
}
Also it seems that Elasticsearch does not provide a keep-alive timeout hint which is problematic. What is the default keep-alive timeout for esc?
Hello! Thanks for the tip! I'll look into this. Sidenote, bulk is not idempotent, as is an API used to perform write operations. Many read APIs are using POST tho, we could improve there.
bulk is not idempotent
What part is not idempotent?
It depends on what you are doing. While delete
and create
operations are idempotent, index
and update
will at least update the document's _version
field. While if you are sending an index
operation without a document id, you will be always creating a new document.
@delvedor Thanks! Maybe we could dynamically check for those and set idempotent accordingly?
Also I would recommend using different undici dispatchers for pipelinable and non-pipelinable requests.
I'm mostly familiar with couchdb and in our couch client we use one dispatcher per couch view.
Thanks! Maybe we could dynamically check for those and set idempotent accordingly?
Currently, we are not tracking which operations are idempotent and which aren't, so we can only detect the HTTP method. But we can improve our specification to track this and make our code generation smarter.
Thanks for your suggestions!
Cool! What about the keep-alive hint?
Elasticsearch has no timeout or pipelining limit and sends no hints :)
Elasticsearch has no timeout or pipelining limit and sends no hints :)
Then you might want to set keepAliveTimeout: 5 * 60e3
or something so that undici doesn't aggressively disconnect. The default is quite short to be safe.
Another thing (since I'm on this topic). It is very important that the client fully consumes any and all response bodies. Might be worth to add to documentation if end-users are able to have streaming responses.
It is very important that the client fully consumes any and all response bodies. Might be worth to add to documentation if end-users are able to have streaming responses.
By default, the client will always consume the body (you can find here how undici is being used), but advanced users can pass the asStream
option which will give them back the raw stream from undici. You are right that we should mention the risks of doing so in the docs :)