Agent shutdown does not terminate running process
When ACA-Py is shutdown because of an error it does not shutdown the process. I just get the following error:
503: Shutdown in progress
Is this intentional / is there something I can do about it? I have a docker setup that automatically restarts services on shutdown, but it doesn't work because the actual process is never terminated.
Ran into the same error here.
Basically, When I attempted to lookup for a credential definition that no longer exists on the Ledger. ACA-PY simply crashes and shows this error 503: shutdown in progress. Same behaviour for looking up schemas as well.
ACA-PY logs
2023-07-30 18:28:53 Traceback (most recent call last):
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/asyncio/tasks.py", line 180, in _step
2023-07-30 18:28:53 result = coro.send(None)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/messaging/credential_definitions/routes.py", line 460, in credential_definitions_fix_cred_def_wallet_record
2023-07-30 18:28:53 schema_response = await ledger.get_schema(schema_seq_no)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/ledger/indy_vdr.py", line 413, in get_schema
2023-07-30 18:28:53 return await self.fetch_schema_by_seq_no(int(schema_id))
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/ledger/indy_vdr.py", line 495, in fetch_schema_by_seq_no
2023-07-30 18:28:53 f"Could not get schema from ledger for seq no {seq_no}"
2023-07-30 18:28:53 aries_cloudagent.ledger.error.LedgerTransactionError: Could not get schema from ledger for seq no 843825
2023-07-30 18:28:53 2023-07-30 21:28:53,061 aries_cloudagent.admin.server ERROR Shutdown with Could not get schema from ledger for seq no 843825
2023-07-30 18:28:53 2023-07-30 21:28:53,061 aiohttp.server ERROR Error handling request
2023-07-30 18:28:53 Traceback (most recent call last):
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aiohttp/web_protocol.py", line 433, in _handle_request
2023-07-30 18:28:53 resp = await request_handler(request)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aiohttp/web_app.py", line 504, in _handle
2023-07-30 18:28:53 resp = await handler(request)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aiohttp/web_middlewares.py", line 117, in impl
2023-07-30 18:28:53 return await handler(request)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/admin/server.py", line 173, in ready_middleware
2023-07-30 18:28:53 return await handler(request)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/admin/server.py", line 210, in debug_middleware
2023-07-30 18:28:53 return await handler(request)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aiohttp_apispec/middlewares.py", line 45, in validation_middleware
2023-07-30 18:28:53 return await handler(request)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/admin/server.py", line 330, in check_token
2023-07-30 18:28:53 return await handler(request)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/admin/server.py", line 379, in check_multitenant_authorization
2023-07-30 18:28:53 return await handler(request)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/admin/server.py", line 423, in setup_context
2023-07-30 18:28:53 return await task
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/asyncio/tasks.py", line 250, in _wakeup
2023-07-30 18:28:53 future.result()
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/asyncio/tasks.py", line 180, in _step
2023-07-30 18:28:53 result = coro.send(None)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/messaging/credential_definitions/routes.py", line 460, in credential_definitions_fix_cred_def_wallet_record
2023-07-30 18:28:53 schema_response = await ledger.get_schema(schema_seq_no)
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/ledger/indy_vdr.py", line 413, in get_schema
2023-07-30 18:28:53 return await self.fetch_schema_by_seq_no(int(schema_id))
2023-07-30 18:28:53 File "/home/indy/.pyenv/versions/3.6.13/lib/python3.6/site-packages/aries_cloudagent/ledger/indy_vdr.py", line 495, in fetch_schema_by_seq_no
2023-07-30 18:28:53 f"Could not get schema from ledger for seq no {seq_no}"
2023-07-30 18:28:53 aries_cloudagent.ledger.error.LedgerTransactionError: Could not get schema from ledger for seq no 843825
2023-07-30 18:28:53 2023-07-30 21:28:53,067 aiohttp.access INFO 172.21.0.1 [30/Jul/2023:21:28:52 +0000] "POST /credential-definitions/GsWeUL2xLRFzchznmdbF3F%3A3%3ACL%3A843825%3AHealthCard/write_record HTTP/1.1" 500 406 "http://localhost:8031/api/doc" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
2023-07-30 18:28:58 2023-07-30 21:28:58,063 aries_cloudagent.ledger.indy_vdr DEBUG Closing pool ledger after timeout
2023-07-30 18:29:08 2023-07-30 21:29:08,337 aiohttp.access INFO 172.21.0.1 [30/Jul/2023:21:29:08 +0000] "GET /credential-definitions/GsWeUL2xLRFzchznmdbF3F%3A3%3ACL%3A843825%3AHealthCard HTTP/1.1" 503 194 "http://localhost:8031/api/doc" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
2023-07-30 18:29:46 2023-07-30 21:29:46,458 aiohttp.access INFO 172.21.0.1 [30/Jul/2023:21:29:46 +0000] "GET /credential-definitions/GsWeUL2xLRFzchznmdbF3F%3A3%3ACL%3A843825%3AHealthCard HTTP/1.1" 503 194 "http://localhost:8031/api/doc" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36"
@esune — let’s talk about this one. I think we should get it fixed sooner than later.
Background for whomever takes this on.
The behavior is, somewhat, intentional. It is related to the readiness and liveliness endpoints that were added to better support hosting within a Kubernetes environment; https://github.com/hyperledger/aries-cloudagent-python/pulls?q=is%3Apr+liveliness+is%3Aclosed
The 503 error is an indication to Kubernetes that ACA-Py has encountered an unrecoverable error and the pod should be restarted.
That covers the intentional part. As for the "somewhat, intentional" part, ACA-Py does not terminate itself under these conditions. Perhaps it would be appropriate for ACA-Py to have an option to terminate it's own process after a given time when used outside a container orchestration environment.