serverless-plugins icon indicating copy to clipboard operation
serverless-plugins copied to clipboard

serverless-offline-sqs - receiving frequent 503: Service Unavailable

Open zoellner opened this issue 5 years ago • 10 comments

change made by @esetnik in https://github.com/CoorpAcademy/serverless-plugins/commit/40efaf96244d52668d4a2580cbca314e9109d05b seems to be the cause. Setting the timeout to the maximum value ends up with frequent timeouts. Lower numbers work fine

zoellner avatar Apr 01 '20 18:04 zoellner

@zoellner any idea why this is the case? AWS specifies 20s as the maximum allowable WaitTimeSeconds and I use this in my projects without any issues.

esetnik avatar Apr 01 '20 18:04 esetnik

I'm not quite sure yet. It might be related to this https://github.com/softwaremill/elasticmq/issues/9 Although they claim that it is fixed on their end

My serverless-offline-sqs config is

  serverless-offline-sqs:
    autoCreate: false
    apiVersion: '2012-11-05'
    endpoint: http://127.0.0.1:9324
    region: us-east-1
    accessKeyId: root
    secretAccessKey: root
    skipCacheInvalidation: false

using softwaremill/elasticmq:latest docker image for local SQS

zoellner avatar Apr 01 '20 18:04 zoellner

I am using localstack so that could account for the difference

esetnik avatar Apr 01 '20 19:04 esetnik

So maybe make that a parameter passed through from the serverless.yaml config? That way I can adjust it as needed while you can still use the maximum value?

zoellner avatar Apr 01 '20 20:04 zoellner

@zoellner actually the maintainer suggested the same thing in my PR https://github.com/CoorpAcademy/serverless-plugins/pull/96#pullrequestreview-377513244. I'd be happy with exposing a configuration for this, except the default should probably still be 20s as this gives the best performance characteristics.

esetnik avatar Apr 01 '20 20:04 esetnik

I'd suggest 15s as a compromise default. Would have a lower likelihood of others running into similar issues.

zoellner avatar Apr 01 '20 20:04 zoellner

Not opposed to that, but I don't really understand why 15s vs 20s would make a difference from a compatibility perspective. In my experience almost everyone who uses SQS in production sets it to 20s in production to reduce costs without any other downsides so I'm not sure why we'd make it lower here.

esetnik avatar Apr 01 '20 20:04 esetnik

Given that it was at 1s before I'm not sure how many people had an issue with compatibility before the change. But since I didn't dig deeper to find the true cause of the issue with 20s I can't make a super strong argument against it either. Just feel that 15 (or event 19) seconds as default will be a little "safer" for most.

zoellner avatar Apr 01 '20 20:04 zoellner

The reason I changed it from 1s to 20s is that we use real queues in some cases and this change caused a 20x decrease in cost. I’d argue that for most users 20s is the safest setting since changes to this value can cost real money. I am hesitant to agree that a different default is appropriate solely due to a bug in a third-party library. I think the argument is largely irrelevant if you’re able to supply a PR that makes this value configurable, then I’m happy either way because I will just set it at 20 for my projects.

esetnik avatar Apr 01 '20 20:04 esetnik

Not sure using real queues is a common use case for serverless-offline - the name implies that it should be optimized for offline use cases.

I've just downgraded to the previous version for now since this introduced a breaking change, so should not have been a patch version change when released by @CoorpAcademyAdmin

zoellner avatar Apr 01 '20 22:04 zoellner