opensearch-js icon indicating copy to clipboard operation
opensearch-js copied to clipboard

opesearch NPM connection issue on high load with AWS opensearch

Open shashank-sachan opened this issue 3 years ago • 21 comments

Describe the bug We are trying to replace the elasticsearch NPM with opensearch NPM in our project (using AWS opensearch 1.0 cluster). We are connecting to AWS opensearch from AWS lambda. Now in case of high load we start getting ConnectionError: getaddrinfo EMFILE error.

ConnectionError: getaddrinfo EMFILE *******.us-****-1.es.amazonaws.com
    at ClientRequest.onError (/opt/nodejs/node_modules/@opensearch-project/opensearch/lib/Connection.js:126:16)
    at ClientRequest.emit (events.js:400:28)
    at ClientRequest.emit (domain.js:470:12)
    at TLSSocket.socketErrorListener (_http_client.js:475:9)
    at TLSSocket.emit (events.js:400:28)
    at TLSSocket.emit (domain.js:470:12)
    at emitErrorNT (internal/streams/destroy.js:106:8)
    at emitErrorCloseNT (internal/streams/destroy.js:74:3)
    at processTicksAndRejections (internal/process/task_queues.js:82:21) {
  meta: {
    body: null,
    statusCode: null,
    headers: null,
    meta: {
      context: null,
      request: {...},
      name: 'opensearch-js',
      connection: {
        url: 'https://*******.us-****-1.es.amazonaws.com/',
        id: 'https://*******.us-****-1.es.amazonaws.com/',
        headers: {},
        deadCount: 4,
        resurrectTimeout: 1636607063285,
        _openRequests: 0,
        status: 'dead',
        roles: { master: true, data: true, ingest: true }
      },
      attempts: 3,
      aborted: false
    }
  }
}

If I put elasticsearch NPM back then it works fine. By looking at the error, it looks like that there is host resolution problem and doesn't seem related to opensearch. But same host works fine with elasticsearch NPM. Also this is working fine with opensearch NPM when there is low traffic. But when there is high load it starts giving error. Is there any internal issue with opensearch http connection management?

Plugins elasticsearch: "^16.5.0" @opensearch-project/opensearch: "^1.0.0"

Host/Environment (please complete the following information): AWS lambda

shashank-sachan avatar Nov 11 '21 06:11 shashank-sachan

Hi @shashank-sachan thanks for creating this issue. I think there are some confusing parts regarding your questions. First, I see you are referring Plugins elasticsearch: "^16.5.0" which is not nodejs. Seems this plugin (only this one has the 16.5 version) ? So could you help us to confirm what is this 16.5.0 es plugin? Second, from google search, EMFILE error means that the OS is denying your program to open more files/sockets (some discussions here: https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux). From your previous observation, nodejs client has no connection issue if load is low but see issues when load is high. So it seems a limit issue. Could you look at the above discussion link and see whether it is helpful? thanks

ananzh avatar Nov 17 '21 17:11 ananzh

Hi @ananzh, This is the elasticsearch npm - https://www.npmjs.com/package/elasticsearch/v/16.5.0

Now I cannot change the ulimit because I’m not running this on any Linux server. Code is running inside the AWS lambda and it’s connecting to AWS managed opensearch server (previous known as AWS managed elastic search)

shashank-sachan avatar Nov 17 '21 17:11 shashank-sachan

@shashank-sachan If you are referring to issue with elasticsearch-js client, we do not own this client and we would recommend migrate to opensearch-js client.

vamshin avatar Nov 19 '21 23:11 vamshin

@vamshin I’m not reporting issue on elasticsearch-js. I’m reporting issue with opensearch-js only. If you see original issue where I have mentioned in case of high load, there is an issue with opensearch-js.

shashank-sachan avatar Nov 20 '21 03:11 shashank-sachan

Hi @shashank-sachan! From my understanding that the opensearch-js client was forked from the new client where you the elasticsearch-js client you are referencing is the deprecated client. Perhaps there is an optimization somewhere in the deprecated version or there is a missing header or a leak in the new client.

Would you be able to check if the non-deprecated client works as well under a high load? If it does not work then perhaps there is a current limitation that might be slightly out of control in the open-source world or if you are able to share how your function is utilizing the client then we can see if we can debug it to see if anything was missed.

If it does work then there there was a bug and we can transfer this issue to the opensearch-js repo.

Thank you!

kavilla avatar Nov 20 '21 04:11 kavilla

Hi! Is there an update on how this issue was resolved? I'm facing the same error with OpensSearch host

jnith avatar May 02 '22 21:05 jnith

@jnith @shashank-sachan it sounds like opensearch-js is causing the system to run out of file descriptors

What's your output from ulimit -a? Generally this is 1024. There's a bunch of articles on how to increase it (example), I would try that as a workaround.

If you can reproduce this consistently, let's see the output of lsof and lsof -n -i -P (from this SO).

It's possible that this client/server combination has higher throughput than the elasticsearch one and causes you to hit a limit earlier under load. It's also possible that the client is leaking handles, which would be a bug.

dblock avatar May 03 '22 15:05 dblock

Yes thanks for the details. I was able to figure out that the issue was causing because each update request to openSearch is creating a new connection and I was creating 1000s of connection. I was able to overcome the issue with the Bulk update feature and hence limiting the number of connections.

jnith avatar May 03 '22 16:05 jnith

@dblock I'm using this inside AWS lambda so I don't have control over ulimit. But elasticsearch-js was not causing this issue so may be we need to the optimization in this library.

shashank-sachan avatar May 03 '22 16:05 shashank-sachan

@shashank-sachan Which version of elasticsearch-js were you using? Need to track down the diff (help wanted). Either way sounds like we're creating some unbounded number of connections, which should be replaced by a pool.

dblock avatar May 06 '22 22:05 dblock

Recently been hitting the same issue, is there a way to control the number of connections? It seems like it's already using ConnectionPool by default. Notice the ConnectionPool is reporting connection is alive when the request is already dead.

hongkheng avatar Aug 08 '23 02:08 hongkheng

I had exactly the same problem when replacing ElasticSearch client with OpenSearch client. The problem fix was simple: just call client's close function when you're finished with the client. In my case, the client is re-created per lambda instance invoke (the same client is not reused in different invocations).

Sadly the use of close is not documented in OpenSearch Javascript client docs. Only Java documentation says that you should close the client when you're finished.

perttu-n avatar Sep 11 '23 12:09 perttu-n

@perttu-n Care to add it to the documentation?

dblock avatar Sep 11 '23 17:09 dblock

I am having the same issue while replacing the elasticsearch client. I am running it from lambda. Previously I was using the elasticsearch 16.7.3 version. @shashank-sachan did you able to fix it?

arnabrahman avatar Sep 15 '23 04:09 arnabrahman

Ok so after spending a day on this, i solved it. I realized the client initialization was happening inside the handler function of lambda. So, I just moved the initialization part out of the handler function. So, now it's using the same client instance as long as the lambda is alive.

arnabrahman avatar Sep 15 '23 11:09 arnabrahman

@arnabrahman Would you care to contribute something to the documentation/samples for the next person? Maybe an example for Lambda with some cautionary tales?

dblock avatar Sep 15 '23 16:09 dblock

@dblock Sure, happy to contribute to open source. Let me know how to get started.

arnabrahman avatar Sep 16 '23 06:09 arnabrahman

@arnabrahman You can get started here: https://github.com/opensearch-project/.github/blob/main/ONBOARDING.md. Thanks!

dblock avatar Sep 18 '23 11:09 dblock

@dblock created a PR of the explanation in the doc.

arnabrahman avatar Oct 15 '23 07:10 arnabrahman

@arnabrahman do you think we should close this one?

dblock avatar Jan 08 '24 16:01 dblock

@dblock Maybe, we have added an explanation in the documentation to guide users in avoiding this problem.

However, it's worth noting that the previous Elasticsearch client did not encounter such issues. So, if it's possible then we should look into code to see why it's behaving differently.

arnabrahman avatar Jan 21 '24 05:01 arnabrahman