sqs-consumer SQS stopped polling.

Describe the bug The polling of SQS messages for various queues is working fine but the SQS stopped polling twice and all of the SQS messages started to add up in queue. I restarted the server and the queues started to be polled and process instantly. After a month same thing happened and SQS stopped polling and messages started to add up in queues, restarted the server and SQS began to poll and process queues. I've been using several SQS for a long time but this has happened twice in a span of 2 months on production server. Also no error was triggered on events using 'error' and 'processing_error'.

Version of sqs-consumer is "^5.4.0". Version of aws-sdk": "^2.585.0

I've seen a similar issue being reported previously but that was closed with a fix. https://github.com/bbc/sqs-consumer/issues/130 Any idea what could possibly be the reason ?

To Reproduce

Expected behaviour Do not expect SQS to stop polling. Also why did the server restart solved the issue and SQS started to poll again which was in a halt state.

screenshots

Additional context I'm using node.js (v10.19.0) deployed on docker.

Sep 02 '20 09:09 mashoodrafi6

We are also facing the same.

sqs : 5.4.0 aws-sdk : 2.611.0 node version : 10

Sep 16 '20 08:09 KencyK

Same issue here.

sqs : 5.4.0 aws-sdk : 2.688.0 node version : 12

Sep 16 '20 09:09 larsjarek

Also encountering this issue, from time to time.

"sqs-consumer": "^5.4.0", "aws-sdk": "^2.658.0", node version: 12

Also posting a picture from Sentry that caught this error. Screen Shot 2020-09-22 at 11 16 14 AM

Sep 22 '20 03:09 Tobska

We just hit this this morning, sqs-consumer 5.4.0, node 10, aws-sdk 2.708.0. No errors emitted.

Sep 23 '20 14:09 mfrobben

We also encountered this today:

sqs-consumer 5.4.0
Node 14.7.0 (using the node:14.7.0-buster Docker image)
aws-sdk 2.743.0

No errors were emitted.

Oct 09 '20 18:10 benjamin-greve-cove

same issue

"sqs-consumer": "^5.4.0" "aws-sdk": "^2.574.0",

node 12

Oct 14 '20 11:10 artur-ma

Can you post your consumer options? There may be a common configuration setting when this issue is occurring (such as keeping the http connection open, or not providing a handler timeout).

Oct 15 '20 02:10 achallett

@achallett

batchSize: 10,
custom sqs client with https keepAlive client is used (https://www.npmjs.com/package/agentkeepalive)
visibilityTimeout: 60

multiple instances(10) of sqs-consumer running in the same nodejs process

Oct 20 '20 20:10 artur-ma

Can you post your consumer options? There may be a common configuration setting when this issue is occurring (such as keeping the http connection open, or not providing a handler timeout).

@achallett here you go

AWS config ----> Type: Standard, Encryption: Disabled, DLQ: Disabled, Max message size: 256kB, Messsage retention period: 4 days, Default visibility timeout - 30secs,

We listen to different SQS queues like this as well in the same app ` import AWS from 'aws-sdk'; import { Consumer } from 'sqs-consumer'; import { awsConfig } from '../../common/config/config'; import { Logger } from '../../common/config/logger';

const logger = Logger.getInstance(module);

const sqs = () => { AWS.config.update({ region: awsConfig.region, }); return new AWS.SQS(); };

/**

Consumer listening to messages from AWS SQS.

*/ const sqsAudit = Consumer.create({ queueUrl: awsConfig.sqsAuditUrl, sqs: sqs(), handleMessage, // defined in a different file messageAttributeNames: ['event'], });

sqsAudit.on('error', (err: Error) => { logger.error(Error while interacting with queue: ${err}); });

sqsAudit.on('processing_error', (err: Error) => { logger.error(Error while processing message: ${err.message}); });

sqsAudit.on('timeout_error', (err: Error) => { logger.error(Handle message timed out: ${err}); });

export default sqsAuditLogApp; `

Oct 28 '20 09:10 KencyK

Could you please share an update on this issue @achallett ?

Nov 21 '20 15:11 Adil93

Could be an upstream issue? There are similar reports in the AWS repo with no clear resolution, e.g. https://github.com/aws/aws-sdk-js-v3/issues/6015

If so, should we implement our own backup timeout for receiveMessage?

Dec 01 '20 15:12 AG-Teammate

Same issue in my case

  public startConsumer = (queueUrl: string) => {
    console.log(`Starting consumer, ${queueUrl}`);
    Consumer.create({
      queueUrl,
      handleMessage: (msg) =>
        this.consumer.handle(JSON.parse(<string>msg.Body)).catch((error) => {
          console.log('ConsumerRunner error:', error);

          throw error;
        }),
    }).start();
  };

Jan 05 '21 15:01 pawelszczerbicki

Here's a possible workaround. Do you foresee any problems with this approach?

// Check every 30 seconds to see if poller is still running. If not, then re-start it.

setInterval(() => {
  const isRunning = app.isRunning

  console.info(`sqs poller isRunning? : ${isRunning}`)

  if (!isRunning) {
    console.warn('sqs poller is not running. Re-starting now.')
    app.start()
  }
}, 30000)

Feb 03 '21 10:02 ariesmcrae

Fixing problems by continuous restart its not a good approach :) In my opinion its a way better to find and solve source of problem :)

Feb 03 '21 11:02 pawelszczerbicki

Hi,

I am facing same scenario in my product which is going to go live in a week. Is there any good work around for this. Don't have the time to remove this now. @AG-Teammate @pawelszczerbicki @ariesmcrae @KencyK @Tobska @mfrobben

Feb 12 '21 12:02 swapnil0545

Does this help? How to try this or has anyone tried this? https://stackoverflow.com/questions/37111431/amazon-sqs-with-aws-sdk-receivemessage-stall

Feb 13 '21 05:02 swapnil0545

Facing this issue as well. It's freezing in my production environment.

Jun 30 '21 03:06 ashprojects

I removed await key word from the function called inside SQS consumer and handle retries myself and since then it haven't occurred.

Jun 30 '21 04:06 mashoodrafi006

I've a same issue, I use NestJS with the @ssut/nestjs-sqs library that depends on sqs-consumer, after a few days, the consumer is not able to receive any more messages from aws (the microservice containing the consumer is still working properly), but when the kubernetes pod is restarted, the consumer in my app starts polling and processing messages from queues again.

node version: 12.19

"@nestjs/core": "^7.6.15",
"aws-sdk": "^2.806.0",
"@ssut/nestjs-sqs": "^1.0.0",
    -> ssut/nestjs-sqs dependencies
         "aws-sdk": "^2.728.0",
         "sqs-consumer": "^5.4.0",
         "sqs-producer": "^2.0.2",

Register SqsModule in app.module.ts

SqsModule.registerAsync({
  imports: [AppConfigModule],
  useFactory: async (configService: ConfigService) => {
    const sqs = new AWS.SQS({          
      accessKeyId: configService.get(Configuration.AWS_ACCESS_KEY_ID), 
      secretAccessKey: configService.get(Configuration.AWS_SECRET_ACCESS_KEY),
      region: configService.get(Configuration.AWS_REGION),
    });
    
    return {
      consumers: [
        {
          name: `${configService.get(Configuration.QUEUE_NAME)}`,
          queueUrl: `${configService.get(Configuration.QUEUE_URL)}`,
          sqs
        },           
      ],
      producers: [],
    };
  },
  inject: [AppConfigService],
}),

Consumer Service

@Injectable()
export class SQSMessageHandler {

  @SqsMessageHandler(`${process.env.QUEUE_NAME}`)
    public async handleMessage(message: AWS.SQS.Message) {
      this.logger.debug(`Incomming message: ${message.MessageId} `, SQSMessageHandler.name);

      ...
  }
}

Jul 07 '21 18:07 titobundy

I removed await key word from the function called inside SQS consumer and handle retries myself and since then it haven't occurred.

Hi @mashoodrafi006 , can you please elaborate how you handled retries on any error

Jul 22 '21 10:07 anoop-chauhan

I wrote a cron command that would bulk update the data between micro-services after hourly interval to patch for the events that were not consumed. @anoop-chauhan

Jul 22 '21 13:07 mashoodrafi006

I was facing this issue because the "handleMessage" of the consumer instance is not resolved for a specific poll, and this caused the the consumer to stall. Fix: Resolved "handleMessage" method for all possible cases, and added better error handling.

Sep 18 '21 14:09 karthikpawar

I was facing this issue because the "handleMessage" of the consumer instance is not resolved for a specific poll, and this caused the the consumer to stall. Fix: Resolved "handleMessage" method for all possible cases, and added better error handling.

Pls can you elaborate more on this. Having same issue for long now

Sep 30 '21 07:09 devdabiri

so what is possible workaround?

Nov 01 '21 07:11 code-xhyun

Has anyone found a solution for this? I am still facing this issue.

Jan 12 '22 20:01 srprinz

I am facing this issue, any solution?

Feb 04 '22 09:02 msuganthan

i'm facing the same issue here, any solution?

Mar 16 '22 13:03 judiel-camonapp

Facing the same issue as well.

@karthikpawar could you elaborate a bit on the changes you made to handleMessage / what case was causing it to not resolve?

Was it an unresolved Promise? Were you not returning within a function somewhere?

Apr 07 '22 17:04 hayes-crowley

I've a same issue, I use NestJS with the @ssut/nestjs-sqs library that depends on sqs-consumer, after a few days, the consumer is not able to receive any more messages from aws (the microservice containing the consumer is still working properly), but when the kubernetes pod is restarted, the consumer in my app starts polling and processing messages from queues again.

Hi. Is this still an issue? Looking in to nest module working through SQS and still not sure if I should use this package and nest-sqs or should I write some boilerplate with aws-sdk? Thanks for any advise on this in advance

May 16 '22 08:05 topmonroe9

Facing same issue. I am using @ssut/nestjs-sqs library which uses internally this one.

Is there any update on this one ?

May 30 '22 10:05 prakash-velotio