kombu icon indicating copy to clipboard operation
kombu copied to clipboard

SQS "Signature expired" Error

Open spawn-guy opened this issue 1 year ago • 2 comments
trafficstars

 [2024-02-20 15:39:40,288: CRITICAL/MainProcess] Unrecoverable error: Exception('Request HTTP Error  HTTP 403  Forbidden (b\'<?xml version="1.0"?><ErrorResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><Error><Type>Sender</Type><Code>SignatureDoesNotMatch</Code><Message>Signature expired: 20240220T151817Z is now earlier than 20240220T152440Z (20240220T153940Z - 15 min.)</Message><Detail/></Error><RequestId>bla-bla-bla</RequestId></ErrorResponse>\')')
Traceback (most recent call last):
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/celery/worker/worker.py", line 202, in start
self.blueprint.start(self)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/celery/worker/consumer/consumer.py", line 340, in start
blueprint.start(self)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/celery/worker/consumer/consumer.py", line 742, in start
c.loop(*c.loop_args())
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/kombu/asynchronous/hub.py", line 373, in create_loop
cb(*cbargs)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/kombu/asynchronous/http/curl.py", line 122, in on_readable
return self._on_event(fd, _pycurl.CSELECT_IN)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/kombu/asynchronous/http/curl.py", line 139, in _on_event
self._process_pending_requests()
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/kombu/asynchronous/http/curl.py", line 145, in _process_pending_requests
self._process(curl)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/kombu/asynchronous/http/curl.py", line 191, in _process
request.on_ready(self.Response(
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/vine/promises.py", line 168, in __call__
svpending(*ca, **ck)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/vine/promises.py", line 161, in __call__
return self.throw()
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/vine/promises.py", line 158, in __call__
retval = fun(*final_args, **final_kwargs)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/vine/funtools.py", line 98, in _transback
return callback(ret)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/vine/promises.py", line 161, in __call__
return self.throw()
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/vine/promises.py", line 158, in __call__
retval = fun(*final_args, **final_kwargs)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/vine/funtools.py", line 96, in _transback
callback.throw()
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/vine/funtools.py", line 94, in _transback
ret = filter_(*args + (ret,), **kwargs)
File "/var/app/venv/staging-LQM1lest/lib64/python3.8/site-packages/kombu/asynchronous/aws/connection.py", line 246, in _on_list_ready
raise self._for_status(response, response.read())
Exception: Request HTTP Error  HTTP 403  Forbidden (b'<?xml version="1.0"?><ErrorResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/"><Error><Type>Sender</Type><Code>SignatureDoesNotMatch</Code><Message>Signature expired: 20240220T151817Z is now earlier than 20240220T152440Z (20240220T153940Z - 15 min.)</Message><Detail/></Error><RequestId>bla-bla-bla</RequestId></ErrorResponse>')

started seeing these messages yesterday. and yesterday i was on celery<5(due to async support)

today i migrated to celery v5 with a modified version of https://github.com/the-wondersmith/celery-aio-pool (to support our current py3.8)

celery==5.3.6 kombu==5.3.5

and i still see this message. a good thing that celery v5 worker restarts afterwards (unlike v4 when it was hanging indefinitely)

processes i am running can take a long time to finish. as a mitigation technique i tried increasing message visibility on SQS side to 30m (from 20m) - it is better now, but still i see this message

spawn-guy avatar Feb 20 '24 15:02 spawn-guy

BTW, for the aio_pool to work on py 3.8 i had to modify typing (Tuple and Dict) and add a backport version of aio.to_thread - otherwise works decent(no concurrency support) as a drop-in replacement for kai's celery-pool-asyncio

spawn-guy avatar Feb 20 '24 15:02 spawn-guy

i've debugged the code a bit yesterday. and now i have a few options:

  • either we ignore the 403 errors (not preferred, but could work) as some other ones here https://github.com/celery/kombu/blob/main/kombu/asynchronous/aws/connection.py#L244
  • or i have to sts-assume-role, but i already assume the same role from current ec2-instance-profile. code should understand that credentials are "refreshable/expiring", as the boto3/kombu doesn't understand this now
  • or boto3 will understand that the credentials from ec2 instance-profile are "expiring" and need to be "refreshed" every now-and-then see https://github.com/boto/boto3/issues/443 like in Java/Ruby/PHP SDK

what are your thoughts, community?

spawn-guy avatar Feb 28 '24 13:02 spawn-guy