serverless-offline-sqs - receiving frequent 503: Service Unavailable
change made by @esetnik in https://github.com/CoorpAcademy/serverless-plugins/commit/40efaf96244d52668d4a2580cbca314e9109d05b seems to be the cause. Setting the timeout to the maximum value ends up with frequent timeouts. Lower numbers work fine
@zoellner any idea why this is the case? AWS specifies 20s as the maximum allowable WaitTimeSeconds and I use this in my projects without any issues.
I'm not quite sure yet. It might be related to this https://github.com/softwaremill/elasticmq/issues/9 Although they claim that it is fixed on their end
My serverless-offline-sqs config is
serverless-offline-sqs:
autoCreate: false
apiVersion: '2012-11-05'
endpoint: http://127.0.0.1:9324
region: us-east-1
accessKeyId: root
secretAccessKey: root
skipCacheInvalidation: false
using softwaremill/elasticmq:latest docker image for local SQS
I am using localstack so that could account for the difference
So maybe make that a parameter passed through from the serverless.yaml config? That way I can adjust it as needed while you can still use the maximum value?
@zoellner actually the maintainer suggested the same thing in my PR https://github.com/CoorpAcademy/serverless-plugins/pull/96#pullrequestreview-377513244. I'd be happy with exposing a configuration for this, except the default should probably still be 20s as this gives the best performance characteristics.
I'd suggest 15s as a compromise default. Would have a lower likelihood of others running into similar issues.
Not opposed to that, but I don't really understand why 15s vs 20s would make a difference from a compatibility perspective. In my experience almost everyone who uses SQS in production sets it to 20s in production to reduce costs without any other downsides so I'm not sure why we'd make it lower here.
Given that it was at 1s before I'm not sure how many people had an issue with compatibility before the change. But since I didn't dig deeper to find the true cause of the issue with 20s I can't make a super strong argument against it either. Just feel that 15 (or event 19) seconds as default will be a little "safer" for most.
The reason I changed it from 1s to 20s is that we use real queues in some cases and this change caused a 20x decrease in cost. I’d argue that for most users 20s is the safest setting since changes to this value can cost real money. I am hesitant to agree that a different default is appropriate solely due to a bug in a third-party library. I think the argument is largely irrelevant if you’re able to supply a PR that makes this value configurable, then I’m happy either way because I will just set it at 20 for my projects.
Not sure using real queues is a common use case for serverless-offline - the name implies that it should be optimized for offline use cases.
I've just downgraded to the previous version for now since this introduced a breaking change, so should not have been a patch version change when released by @CoorpAcademyAdmin