sqs-consumer
sqs-consumer copied to clipboard
[Bug]: `MaxListenersExceededWarning` when using newer AWS SDK version
Describe the bug
Hi,
Recently we tried to upgrade our dependencies, as part of this we tried to upgrade @aws-sdk/client-sqs from v3.576.0 to v3.600.0. After the upgrade I noticed we are getting MaxListenersExceededWarning warnings.
After looking into it turns out that there were changes in one of the SDK dependencies and the issue can be triggered starting from version v3.582.0. The change https://github.com/smithy-lang/smithy-typescript/pull/1308 .
I believe the issue is due to reusing the same abort controller in each request, if I set a new AbortController after a successful poll the error goes away.
The full warning:
(node:27582) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit
at EventTarget.[kNewListener] (node:internal/event_target:571:17)
at EventTarget.[kNewListener] (node:internal/abort_controller:240:24)
at EventTarget.addEventListener (node:internal/event_target:684:23)
at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/node-http-handler/dist-cjs/index.js:331:23
at new Promise (<anonymous>)
at _NodeHttpHandler.handle (<project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/node-http-handler/dist-cjs/index.js:248:12)
at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/smithy-client/dist-cjs/index.js:192:35
at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/middleware-serde/dist-cjs/index.js:33:30
at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/core/dist-cjs/index.js:165:24
at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
at <project>/node_modules/@aws-sdk/middleware-sdk-sqs/dist-cjs/index.js:92:18
at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:34:22
at Consumer.receiveMessage (<project>/node_modules/sqs-consumer/dist/cjs/consumer.js:214:28)
Your minimal, reproducible example
https://codesandbox.io/p/sandbox/upbeat-jerry-k6pplg
Steps to reproduce
1 .Get the example from the Code Sandbox
2. Start docker compose to run SQS mock service
3. Start the example script: NODE_OPTIONS="--trace-warnings" DEBUG=sqs-consumer npm run start
4. observe the warning
Otherwise:
- upgrade to newer a
@aws-sdk/client-sqsversion (>= 3.582.0) - set the
WaitTimeSecondsin theReceiveMessageCommandto 1 second to trigger the warning faster - observe the warning
Expected behavior
There shouldn't be warning.
How often does this bug happen?
Every time
Screenshots or Videos
No response
Platform
- OS: macOS Sonoma
- Node: v20.13.1.
Package version
v10.3.0
AWS SDK version
>=3.582.0
Additional context
No response
Hey, thanks for raising an issue.
3.6 of the sdk would be a major version, we don't naturally support major version releases as they come out as they come with breaking changes like this.
It would be something that we'd have to look at for our next major/minor release ourselves.
I am rather busy at the moment though, so I would suggest staying on 3.5 for a little while longer.
We'd be happy to take a pr though if this is an issue for anyone.
Hej, thanks for the quick response.
I can reproduce this issue just upgrading to 3.582.0 which if I understood correctly is part of 3.5, right? 🤔
I can look into it and open a PR. My fix during the test was very simple: passing new AbortController per request. Do you see any issue with that?
Ahh sorry, I thought you said it was 3.6, the pr you linked also seems like a breaking change to me.
If I have time I might be able to look into it. In terms of a possible solution, that might be if, I've not looked too deep though.
I'm having the same issue and can confirm that this is not just a warning but continues to leak until the process is out of memory.
If I have spare time this week I'll take a look, unfortunately, until then, I'd have to ask that everyone bares with.
If anyone has the time themselves to look, that'd be super helpful, but if not, you'll have to wait on me. Unfortunately, it has come at a particularly busy time for the BBC.
I've possibly been able to resolve this by pinning the @smithy/node-http-handler package (which is used internally by @aws-sdk) to version 3.0.1
With this, I have been able to upgrade the @aws-sdk packages to version ^3.600.0. This is not a long-term solution but provides a bridging solution until the root cause is addressed.
So, in short, add the following to your dependencies in your package.json file:
"@smithy/node-http-handler": "3.0.1",
Note: After doing this, you will need to delete your package-lock.json file, or npm will install the version pinned in the lock file, which may not be the same as the version you have pinned in your package.json file.
I've deployed this change into our systems and will monitor memory usage over the next few days, but the warning no longer presents.
I hope this is helpful.
After reviewing the impact of the above fix, it does appear that the leak has been resolved, so I'm happy to recommend it as a temporary solution until a more permanent fix is available.
Awesome! So it is a change in the next version that causes the issue, that makes it easier to figure out what's going on, thanks!
(This one: https://github.com/smithy-lang/smithy-typescript/releases/tag/%40smithy/node-http-handler%403.1.0 which is the change mentioned at the top, spot on!)
On further research, this may be because the Aborts are not automatically being cleaned up, there's an old Node issue around this: https://github.com/nodejs/node/issues/46525
Possibly because smithy is adding event listeners: https://github.com/smithy-lang/smithy-typescript/pull/1308/files#diff-98216b22e6b39a1cd6f2ab013304d4964152b554c4ad8fee4182a961b42285d8R192 but not removing them?
Update: I've opened an issue around this here: https://github.com/smithy-lang/smithy-typescript/issues/1318
Update on this, I have published 11.0.0-canary.1 with the latest version of the aws-sdk to test this with that new version and I've made a new example that sets up Docker with localstack for a truer test: https://github.com/bbc/sqs-consumer-starter/tree/main/examples/docker
With this, I have replicated the issue:
Still trying to figure out the best way to resolve it though.
@erichkuba Still I have the issue.
Node version : 18.20.2 "@aws-sdk/client-sqs": "^3.600.0", "@ssut/nestjs-sqs": "^2.1.0",
I tried with @smithy/node-http-handler package and still the issue persists.
Hey, I'm still waiting on a response from AWS on the issue I raised above.
Outside of that, the current suggestion is to do what was described here: https://github.com/bbc/sqs-consumer/issues/500#issuecomment-2185502164, or pin your AWS SDK version to before this change.
Outside of that, if anyone has any ideas for how it might be resolved in SQS Consumer, feel free to open a PR, we are in an election period at the moment, so I'm strapped for time.
Could we avoid adding "me too" comments though please? This just fills up the thread and makes it unreadable for other people coming in.
@joseamt Thanks for bringing this up. I could have been clearer in https://github.com/bbc/sqs-consumer/issues/500#issuecomment-2185502164, and have updated it accordingly.
You need to pin the @smithy/node-http-handler package in your package,json, AND delete your package-lock.json, or the version in the package-lock.json file will be used.
I hope that gets you across the line.
@erichkuba Thanks for the update, it worked 👍 👍 👍 !!!. Please add the comments here if you get any updates from AWS. Once again thanks for the great help
cheers 👍 💯
A PR has been raised by another member of the BBC that may fix this: https://github.com/smithy-lang/smithy-typescript/pull/1331
Still waiting on a response from AWS on the issue, but hopefully they'll review this as a fix.
I had a little time to mess around with this today. I ran the docker example @nicholasgriffintn linked above (but not with tsx — I compiled the TypeScript with tsc and then ran the JS code). The consumer options are:
const options: ConsumerOptions = {
queueUrl: "http://127.0.0.1:4566/000000000000/sqs-consumer-test",
sqs: new SQSClient({
endpoint: "http://127.0.0.1:4566/000000000000/sqs-consumer-test",
// This is the latest version of @smithy/node-http-handler
requestHandler: new NodeHttpHandler({
// Disable keepAlive in order to keep as little state around in memory as possible
httpsAgent: new https.Agent({ keepAlive: false }),
httpAgent: new http.Agent({ keepAlive: false }),
}),
}),
// Artificially fast in order to test the memory pressure over time
pollingWaitTimeMs: 5,
waitTimeSeconds: 0.01,
postReceiveMessageCallback: async () => {
requests += 1;
if (requests % 20 === 0) {
// Print the memory usage
console.log(`${requests}\t\t${process.memoryUsage().heapUsed}`);
}
},
handleMessage: async () => {},
};
Then I run the scripts with:
tsc && \
# Artificially limit the memory available to ensure that the Garbage Collector kicks in regularly \
NODE_OPTIONS='--max-old-space-size=14' \
AWS_REGION=eu-west-1 AWS_ACCESS_KEY_ID=123 AWS_SECRET_ACCESS_KEY=456 \
node ./index.js > some-data-file.tsv
Results
Here are some results for:
- The original production code
- This fails with an out-of-memory error where the line stops
- The original production code but with
{ once: true }set in@smithy/node-http-handler(see https://github.com/smithy-lang/smithy-typescript/pull/1331)- This fails with an out-of-memory error pretty quickly (not surprisingly since unless you're actually aborting, the listeners are never removed)
- With
sqs-consumerpatched so that noabortSignalis passed to the AWS SDK- This is our baseline — whatever we do with
AbortSignals, we're never going to get better performance than this.
- This is our baseline — whatever we do with
Now the interesting bit....
These results are for:
-
No
AbortSignals(our baseline) -
Using a new
AbortSignalfor every request fromsqs-consumer— this isn't wired up so we can actually trigger the abort, but tests how well these are garbage collected -
Using a new
AbortSignalin every request fromsqs-consumerbut wiring it up to the globalAbortSignal:const requestAbortController = new AbortController(); const abortHandler = (reason) => requestAbortController.abort(reason); this.abortController.signal.addEventListener('abort', abortHandler); // Call the AWS SDK passing requestAbortController.signal this.abortController.signal.removeEventListener('abort', abortHandler); -
The same as the previous example, but with patched code in
@smithy/node-http-handlerto remove theAbortSignallisteners when the request finished:if (typeof abortSignal.addEventListener === "function") { abortSignal.addEventListener("abort", onAbort, { once: true }); // This line is new: req.on('close', () => abortSignal.removeEventListener('abort', onAbort)) } -
The original
sqs-consumercode (with a singleAbortSignalshared across all requests) but with the patch to tidy listeners in@smithy/node-http-handler
Discussion / Conclusions
In the graph above, it's clear that the best memory performance comes when not using any AbortSignal — this is not at all surprising. While it might be tempting to try to infer some differences between the performance of the other options, I think that any differences are due to different Garbage Collection processing and that if I re-ran the tests, any differences would be within the measurement error.
From these measurements, I can say that we could make changes to sqs-consumer to fix this issue without any changes to the @smithy/node-http-handler library:
- Use a new
AbortSignalfor each request - Add an
abortlistener to the globalAbortSignalthat triggers the requestAbortSignal - Ensure that the
abortlistener is removed from the globalAbortSignalwhen the request is complete
In the meantime I'll raise another PR for the @smithy/node-http-handler library that removes the event listeners from the AbortSignal when the request has completed (this is slightly more tricky for the fetch handler than the http and https handlers).
As promised: https://github.com/smithy-lang/smithy-typescript/pull/1332
@paulbrimicombe is this error fixed?
I've just ran our Docker setup for 5 minutes (https://github.com/bbc/sqs-consumer-starter/tree/main/examples/docker), and got no errors.
This is with 11.0.0-canary.6, but this is just a version to indicate our support of the new AWS SDK major version, on your side, you'll just need to make sure your package lock lists 3.1.2 or above for the package @smithy/node-http-handler.
If we don't see any further issues, we'll release v11 to the latest tag to indicate our support.
Thanks @paulbrimicombe btw!
Hey, we've now released v11.0.0 of the package (https://github.com/bbc/sqs-consumer/releases/tag/v11.0.0).
I'm going to close this issue as it looks like the update the node-http-handler fixed the issue, please feel free to comment if the issue crops up again, if we receive no further comments, this issue will be auto locked in 30 days.
@nicholasgriffintn @paulbrimicombe @erichkuba Thanks for the update and support!! 👍 💯
This issue has been closed for more than 30 days. If this issue is still occuring, please open a new issue with more recent context.