aries-cloudagent-python icon indicating copy to clipboard operation
aries-cloudagent-python copied to clipboard

Upgrade to anoncreds via api endpoint

Open jamshale opened this issue 9 months ago • 1 comments

This was originally opened as https://github.com/hyperledger/aries-cloudagent-python/pull/2840 but this addresses some comments and adds improvements, mostly with handling multi instance aca-py scenarios.

The upgrading middleware will now use a cache during upgrades and after the upgrade has been completed. Agents that have the upgrade middleware but never upgrade will still have a very minor performance hit, due to needing to query the is_upgrading record.

There is a sequence diagram at docs/design/UpgradeViaApi.md for describing the design and flow.

In multi instance scenarios the instances which did not receive the upgrade request will now detect a upgrade has been initiated from another instance and begin a polling process to check for when the upgrade completes.

In stand alone agents (single tenant) or mutlitenant admin agents any instance that detects the upgrade has completed will still shutdown and be required to start-up again. This could possibly still be avoided, but I think it's best to reload the context from scratch after the upgrade. However, instead of failing after the upgrade restart, due to the storage-type being askar-anoncreds and the wallet-type config being askar. It will instead get a warning to change the configuration, reload the context as anoncreds and successfully launch. I think this is important to prevent accidents where the configuration isn't getting updated immediately after upgrading.

Both these improvements mean the controller no longer has to be concerned about scaling down the aca-py instances and co-coordinating the wallet-type configuration change. The controller operator simply scales down the instances of controller, upgrades itself to handle the anoncreds endpoints and then hits the upgrade endpoints. As long as the aca-py agent restarts itself after shutting down, the rest will happen automatically.

I looked into doing a block on the webhook queue, but I don't think it's necessary. The incoming webhooks will be blocked by the upgrade middleware and outgoing messages shouldn't cause issues with agents. My understanding of the webhooks, and everything they might do during an upgrade, is limited though, so I might be off on that.

See also https://github.com/hyperledger/aries-cloudagent-python/pull/2881 for the migration guide.

jamshale avatar Apr 29 '24 16:04 jamshale

Got a couple of errors while testing this with the alice/faber demo (errors on the alice side).

I think the response from the upgrade should include some kind of status re what was upgraded, currently the response is status 200 with an empty {}. For example we could return {"success":true,"rec_count":3} or whatever ...

My testing:

Ran local postgres instance:

$ docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d -p 5432:5432 postgres -c 'log_statement=all' -c 'logging_collector=on' -c 'log_destination=stderr'

Ran alice and faber:

`` $ AGENT_PORT_OVERRIDE=8010 POSTGRES=1 ./run_demo run faber --wallet-type askar --revocation $ POSTGRES=1 ./run_demo run alice --wallet-type askar


Connect, issue and revoke several credentials.

Upgrade each wallet using the endpoint.

- Response is status 200 with body `{}`

Agents were shut down and the demo doesn't have an option to restart.

So ...  try again with multitenancy.

Restart postgres and then start alice/faber with multitenancy:

$ AGENT_PORT_OVERRIDE=8010 POSTGRES=1 ./run_demo run faber --wallet-type askar --revocation --multitenant $ POSTGRES=1 ./run_demo run alice --wallet-type askar --multitenant


Connect, issue and revoke several credentials (issue 4 and revoke 3).

Upgrade the alice wallet using the endpoint.

- Response is status 200 with body `{}`

Try a proof request, got an error in alice:

Traceback (most recent call last): File "/home/aries/demo/runners/support/agent.py", line 1085, in admin_request resp.raise_for_status() File "/home/aries/.local/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status raise ClientResponseError( aiohttp.client_exceptions.ClientResponseError: 400, message='Error creating presentation. Error converting link secret: Internal OpenSSL error: OpenSSL error.', url=URL('http://host.docker.internal:8031/present-proof-2.0/records/33bd849c-cfc5-4363-a04c-2b61390deed3/send-presentation')

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/aries/demo/runners/agent_container.py", line 579, in handle_present_proof_v2_0 await self.admin_POST( File "/home/aries/demo/runners/support/agent.py", line 1185, in admin_POST Alice | 2024-05-03 17:51:23,684 aries_cloudagent.admin.server ERROR Handler error with exception: Error creating presentation. Error converting link secret: Internal OpenSSL error: OpenSSL error. Alice | Traceback (most recent call last): Alice | File "/home/aries/aries_cloudagent/anoncreds/holder.py", line 545, in create_presentation Alice | presentation = await asyncio.get_event_loop().run_in_executor( Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 284, in await Alice | yield self # This tells Task to wait for completion. Alice | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup Alice | future.result() Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result Alice | raise self._exception Alice | File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run Alice | result = self.fn(*self.args, **self.kwargs) Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/types.py", line 616, in create Alice | bindings.create_presentation( Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/bindings.py", line 763, in create_presentation Alice | do_call( Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/bindings.py", line 477, in do_call Alice | raise get_current_error(True) Alice | anoncreds.error.AnoncredsError: Error converting link secret: Internal OpenSSL error: OpenSSL error Alice | Alice | The above exception was the direct cause of the following exception: Alice | Alice | Traceback (most recent call last): Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/routes.py", line 1204, in present_proof_send_presentation Alice | pres_ex_record, pres_message = await pres_manager.create_pres( Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/manager.py", line 275, in create_pres Alice | pres_tuple = await pres_exch_format.handler(self._profile).create_pres( Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/formats/indy/handler.py", line 162, in create_pres Alice | return await self.anoncreds_handler.create_pres( Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/formats/anoncreds/handler.py", line 164, in create_pres Alice | indy_proof = await indy_handler.return_presentation( Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/anoncreds/pres_exch_handler.py", line 258, in return_presentation Alice | indy_proof_json = await self.holder.create_presentation( Alice | File "/home/aries/aries_cloudagent/anoncreds/holder.py", line 562, in create_presentation Alice | raise AnonCredsHolderError("Error creating presentation") from err Alice | aries_cloudagent.anoncreds.holder.AnonCredsHolderError: Error creating presentation Alice | Alice | The above exception was the direct cause of the following exception: Alice | Alice | Traceback (most recent call last): Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 188, in ready_middleware Alice | return await handler(request) Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 259, in debug_middleware Alice | return await handler(request) Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 430, in check_multitenant_authorization Alice | return await handler(request) Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 495, in setup_context Alice | return await task Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 284, in await Alice | yield self # This tells Task to wait for completion. Alice | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup Alice | future.result() Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result Alice | raise self._exception Alice | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step Alice | result = coro.send(None) Alice | File "/home/aries/aries_cloudagent/admin/server.py", line 222, in upgrade_middleware Alice | return await handler(request) Alice | File "/home/aries/.local/lib/python3.9/site-packages/aiohttp_apispec/middlewares.py", line 45, in validation_middleware Alice | return await handler(request) Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/routes.py", line 1222, in present_proof_send_presentation Alice | await report_problem( Alice | File "/home/aries/aries_cloudagent/protocols/present_proof/v2_0/init.py", line 61, in report_problem Alice | raise http_error_class(reason=err.roll_up) from err Alice | aiohttp.web_exceptions.HTTPBadRequest: Error creating presentation. Error converting link secret: Internal OpenSSL error: OpenSSL error. Alice |


Upgrade the faber wallet using the endpoint.

- Response is status 200 with body `{}`

Try issuing a credential, results in an error on Alice:

15 After receiving credential offer, send credential request Alice | Credential: state = request-sent, cred_ex_id = 520199d3-e87f-4c39-a114-1440a8e72c3e Alice | Credential: state = credential-received, cred_ex_id = 520199d3-e87f-4c39-a114-1440a8e72c3e Alice | 2024-05-03 17:54:55,035 aries_cloudagent.protocols.issue_credential.v2_0.formats.anoncreds.handler ERROR Error storing credential: None - Error processing received credential Alice | 2024-05-03 17:54:55,035 aries_cloudagent.messaging.base_handler ERROR Error storing issued credential Alice | Traceback (most recent call last): Alice | Credential: state = abandoned, cred_ex_id = 520199d3-e87f-4c39-a114-1440a8e72c3e

Credential exchange abandoned Alice | Problem report message: Error processing received credential. Input error [missing field link_secret_blinding_data at line 1 column 839]. Alice | File "/home/aries/aries_cloudagent/anoncreds/holder.py", line 197, in store_credential Alice | cred_recvd = await asyncio.get_event_loop().run_in_executor( Alice | Credential: state = done, cred_ex_id = 520199d3-e87f-4c39-a114-1440a8e72c3e Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 284, in await Alice | yield self # This tells Task to wait for completion. Alice | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup Alice | future.result() Alice | File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result Alice | raise self._exception Alice | File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run Alice | result = self.fn(*self.args, **self.kwargs) Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/types.py", line 286, in process Alice | cred_req_metadata = CredentialRequestMetadata.load(cred_req_metadata) Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/types.py", line 146, in load Alice | bindings._object_from_json( Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/bindings.py", line 568, in _object_from_json Alice | do_call(method, encode_bytes(value), byref(result)) Alice | File "/home/aries/.local/lib/python3.9/site-packages/anoncreds/bindings.py", line 477, in do_call Alice | raise get_current_error(True) Alice | anoncreds.error.AnoncredsError: Input error [missing field link_secret_blinding_data at line 1 column 839] Alice | Alice | The above exception was the direct cause of the following exception: Alice | Alice | Traceback (most recent call last): Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/handlers/cred_issue_handler.py", line 73, in handle Alice | cred_ex_record = await cred_manager.store_credential(cred_ex_record) Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/manager.py", line 625, in store_credential Alice | await cred_format.handler(self.profile).store_credential( Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/formats/indy/handler.py", line 500, in store_credential Alice | return await self.anoncreds_handler.store_credential( Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/formats/anoncreds/handler.py", line 441, in store_credential Alice | raise e Alice | File "/home/aries/aries_cloudagent/protocols/issue_credential/v2_0/formats/anoncreds/handler.py", line 421, in store_credential Alice | cred_id_stored = await holder.store_credential( Alice | File "/home/aries/aries_cloudagent/anoncreds/holder.py", line 206, in store_credential Alice | raise AnonCredsHolderError("Error processing received credential") from err Alice | aries_cloudagent.anoncreds.holder.AnonCredsHolderError: Error processing received credential

ianco avatar May 03 '24 17:05 ianco

Love the idea of a tutorial for doing this using Alice/Faber. Is that possible/reasonable/helpful? You do need to do something first (populate the storage) and then do an upgrade. Even if it does nothing, I think it would be helpful to demonstrate?

swcurran avatar May 03 '24 18:05 swcurran

Tried another test, upgrading faber (and not alice) and everything works fine (can issue and request proof after the upgrade).

Then upgraded alice - same errors as above.

ianco avatar May 03 '24 18:05 ianco

Hmm. I remember seeing that error a long time ago and thought I had fixed it. Possibly an issue with postgres. I'll look into it.

jamshale avatar May 03 '24 18:05 jamshale

Love the idea of a tutorial for doing this using Alice/Faber. Is that possible/reasonable/helpful? You do need to do something first (populate the storage) and then do an upgrade. Even if it does nothing, I think it would be helpful to demonstrate?

It's easy to demonstrate with with alice/faber when using multitenancy (since you don't have to restart the agent). Not sure how to verify that the database is upgraded, probably the GET /multitenancy/wallet/{} endpoint?

For single tenancy we need to provide an option in the demo to restart the agent after it's been upgraded.

ianco avatar May 03 '24 18:05 ianco

Hmm. I remember seeing that error a long time ago and thought I had fixed it. Possibly an issue with postgres. I'll look into it.

I didn't test with sqlite (which is what is used by default for the demo)

ianco avatar May 03 '24 18:05 ianco

I think the response from the upgrade should include some kind of status re what was upgraded, currently the response is status 200 with an empty {}. For example we could return {"success":true,"rec_count":3} or whatever ...

... and if it's a single-tenant upgrade we can include a message that "Agent shut down need to restart" ...

ianco avatar May 03 '24 18:05 ianco

Remind me why do we need a restart in single-tenant mode, but not in multi-tenant mode?

And is there any way to eliminate the requirement?

Maybe we ignore the start up option and use what the storage says?

swcurran avatar May 03 '24 18:05 swcurran

Remind me why do we need a restart in single-tenant mode, but not in multi-tenant mode?

And is there any way to eliminate the requirement?

Maybe we ignore the start up option and use what the storage says?

The restart is because we need to reload the context (all the anoncreds modules, the anoncreds profile).

Currently, it is kind of ignoring the wallet-type configuration on the restart start-up. It's reloading the context and anoncreds modules. However, it seems safer to force the restart for an agent which has an active context. Has already injected other classes into the profile and so on.

That being said, it might be possible to reload the context like it's doing on startup now when the upgrade completion is detected. I didn't do it, because I didn't think a restart was an issue.

jamshale avatar May 03 '24 18:05 jamshale

Haven't been able to reproduce that error yet with the demo and postgres. Also, the integration test ACA-Py Anoncreds Upgrade was supposed to be testing this. Will try and reproduce it a bit more but might need @ianco to show me.

jamshale avatar May 03 '24 18:05 jamshale

Love the idea of a tutorial for doing this using Alice/Faber. Is that possible/reasonable/helpful? You do need to do something first (populate the storage) and then do an upgrade. Even if it does nothing, I think it would be helpful to demonstrate?

It's easy to demonstrate with with alice/faber when using multitenancy (since you don't have to restart the agent). Not sure how to verify that the database is upgraded, probably the GET /multitenancy/wallet/{} endpoint?

You can do this with the /settings endpoint as well now. The wallet.type will be askar-anoncreds after the upgrade has completed.

jamshale avatar May 03 '24 18:05 jamshale

Haven't been able to reproduce that error yet with the demo and postgres. Also, the integration test ACA-Py Anoncreds Upgrade was supposed to be testing this. Will try and reproduce it a bit more but might need @ianco to show me.

OK we can jump on a call if we need to ...

ianco avatar May 03 '24 19:05 ianco

@ianco I'll call you on Monday. I can't replicate it still, but not sure I'm using postgres correctly. I will want the morning to figure it out if it is a problem with actual upgrade and the blinded secret.

I know I had this problem before when I was re-creating the cred defs private and key proof data during the upgrade, but thought it was fixed after I changed the upgrade to use the existing values.

jamshale avatar May 03 '24 19:05 jamshale

That being said, it might be possible to reload the context like it's doing on startup now when the upgrade completion is detected. I didn't do it, because I didn't think a restart was an issue.

My expectation of the “upgrade” process for an operating ACA-Py instance in a Kubernetes environment is:

  • Scale down all current Controller instances using the old CredX implementation.
  • Scale up new Controller instances with updated code to use AnonCreds-RS implementation.
    • The first thing the updated Controller code does is invoke the upgrade endpoint.
    • In practice, of course, only the first instance to start does the actual upgrade. The rest check (possibly waiting…) and discover that the upgrade is complete and carry on without doing anything.
    • As Controller instances come and go (as they do in a Kubebernetes environment), every new instance would invoke the upgrade endpoint.

If the upgrade has to be done by a special instance of the code that has to restart different things, it seems like it is much harder to manage.

That approach also works fine for a non-Kubernetes environment.

swcurran avatar May 03 '24 20:05 swcurran

I think it works the way you are describing.

The controller doesn't have to manage anything. The acapy agents just need to restart after they detect the upgrade completed, like if they crashed. This would be managed by any type of cloud deployment, like kubernetes.

jamshale avatar May 03 '24 20:05 jamshale

I tested again this morning, a few different scenarios.

If I create the wallet (alice) and then immediately upgrade, then there is no problem and I can issue and request proofs.

If I first issue and revoke some credentials (issued 4 revoked 3) and then upgrade alice, then I get the above error when I try to request a proof.

ianco avatar May 06 '24 16:05 ianco

Looks like there's a small change in the link_secret record's formatting I never noticed.

Existing record was saved as json value

{"value":{"ms":"71064680940360014629618879654672290185096545699652080871503389385314050854890"}}

but in anoncreds it's expecting to be saved directly as a string 71064680940360014629618879654672290185096545699652080871503389385314050854890

Hoping it should be a trivial fix. Thanks for finding this.

jamshale avatar May 06 '24 18:05 jamshale

@ianco This has been updated. There was two issues when upgrading a multitenant holder.

  1. Anoncreds expected the link_secret record to be in a slightly different format. The conversion was added to the upgrade process.
  2. When upgrading a subwallet with a credential in the cache, it was using the cache after the upgrade. Causing the new credential of the same type to go into the indycredx path. I flushed the cache during upgrade and it fixed this issue.

Expanded the integration test to cover upgrading the holder subwallet and re-issuing a credential.

jamshale avatar May 07 '24 20:05 jamshale

Also added a small response payload to the upgrade endpoint.

jamshale avatar May 07 '24 20:05 jamshale

@jamshale there are conflicts, I think because I merged the other PR - https://github.com/hyperledger/aries-cloudagent-python/pull/2861

ianco avatar May 07 '24 21:05 ianco

@jamshale there are conflicts, I think because I merged the other PR - #2861

No problem. I thought there might have been more. I'll rebase it in a bit.

jamshale avatar May 07 '24 21:05 jamshale

Also added a small response payload to the upgrade endpoint.

Looks good!

{
  "success": true,
  "message": "Upgrade to anoncreds has been triggered for wallet alice1"
}

ianco avatar May 07 '24 21:05 ianco

Some success! I was able to upgrade both of the alice and faber wallets according to my test scenario above (start alice and faber with askar wallet and in multitenant mode with revocation enabled, issue and revoke a bunch of credentials, then upgrade alice and then faber wallets). However all proof requests from faber to alice return false. I thought it might be related to the other reported bug (#2934) so I deleted all alice credentials from her wallet using the api, issued a new credential (not revoked) and then tried another proof request and it still returned false.

With the askar wallet the proof would still pass as long as there was at least one valid credential, regardless of how many were revoked. So I suspect the issue is in either the anoncreds-rs code or in the aca-py anoncreds-related code.

ianco avatar May 07 '24 21:05 ianco

Some success! I was able to upgrade both of the alice and faber wallets according to my test scenario above (start alice and faber with askar wallet and in multitenant mode with revocation enabled, issue and revoke a bunch of credentials, then upgrade alice and then faber wallets). However all proof requests from faber to alice return false. I thought it might be related to the other reported bug (#2934) so I deleted all alice credentials from her wallet using the api, issued a new credential (not revoked) and then tried another proof request and it still returned false.

With the askar wallet the proof would still pass as long as there was at least one valid credential, regardless of how many were revoked. So I suspect the issue is in either the anoncreds-rs code or in the aca-py anoncreds-related code.

Yes. I think this is the same issue I reported here. I hadn't tried other scenario you described. I thought it was some logic getting the most valid credential, but looks like that might not be the case. I don't think it's related to the upgrade at all, as I can replicate it with fresh anoncreds agents.

I will test some other cases as well but when I hadn't revoked any credentials, the proof verified correctly after the upgrade. It was only when any credentials had been revoked and the wallet was anoncreds that the proof always returned false.

jamshale avatar May 07 '24 22:05 jamshale

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

sonarqubecloud[bot] avatar May 07 '24 22:05 sonarqubecloud[bot]

... but looks like that might not be the case. I don't think it's related to the upgrade at all, as I can replicate it with fresh anoncreds agents.

If it's an existing issue we should probably merge this PR and then deal with it as a separate issue.

ianco avatar May 08 '24 02:05 ianco