aries-cloudagent-python
aries-cloudagent-python copied to clipboard
fix(credo-interop): various didexchange and did:peer related fixes
Fixes #2742 and various other issues discovered while testing ACA-Py against AFJ.
- Send and accept
didexchange/1.1
as handshake protocol in OOB - Credo did not like the fact that we were including both
jwk
andkid
in the protected headers of signed attachments. Thekid
was redundant so I removed it. - ~Credo does not accept
Multikey
verification method type yet (I opened a PR though: https://github.com/openwallet-foundation/credo-ts/pull/1720 -- tests failing :smile:). So I adjusted our did:peer:4 doc creation to useEd25519VerificationKey2020
for now.~ Multikey is now supported by Credo so I'll revert this change.
Quality Gate passed
Kudos, no new issues were introduced!
0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication
@genaris just FYI, with these changes, ACA-Py and AFJ can now successfully complete OOB + DIDExchange with did:peer:4 (or AFJ emitting did:peer:1 and ACA-Py emitting did:peer:4, I believe). I commented on an issue in Credo on issues with did:peer:2 and tagged you. ~No promises but I may take a stab at updating the did:peer:2 resolution to match the updates to the spec.~ Timo made the very valid point on that issue that this would cause issues with backwards compatibility. So I think focusing on did:peer:4 is fair for now! Sorry for the noise!
Wow nice work @dbluhm ! Not only fixes in ACA-Py but also in Credo. That's what I'm talking about! 😄
Now that your Multikey support PR is merged in Credo main branch, I guess you can re-attempt using it.
I haven't gotten this quite right yet. Checking DID Exchange backwards compatibility with earlier versions of ACA-Py, I get this error:
alice_1 | Traceback (most recent call last):
alice_1 | File "/root/.cache/pypoetry/virtualenvs/aries-cloudagent-VA82Wl8V-py3.9/lib/python3.9/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
alice_1 | resp = await request_handler(request)
alice_1 | File "/root/.cache/pypoetry/virtualenvs/aries-cloudagent-VA82Wl8V-py3.9/lib/python3.9/site-packages/aiohttp/web_app.py", line 543, in _handle
alice_1 | resp = await handler(request)
alice_1 | File "/root/.cache/pypoetry/virtualenvs/aries-cloudagent-VA82Wl8V-py3.9/lib/python3.9/site-packages/aiohttp/web_middlewares.py", line 114, in impl
alice_1 | return await handler(request)
alice_1 | File "/usr/src/app/aries_cloudagent/admin/server.py", line 181, in ready_middleware
alice_1 | return await handler(request)
alice_1 | File "/usr/src/app/aries_cloudagent/admin/server.py", line 218, in debug_middleware
alice_1 | return await handler(request)
alice_1 | File "/root/.cache/pypoetry/virtualenvs/aries-cloudagent-VA82Wl8V-py3.9/lib/python3.9/site-packages/aiohttp_apispec/middlewares.py", line 45, in validation_middleware
alice_1 | return await handler(request)
alice_1 | File "/usr/src/app/aries_cloudagent/admin/server.py", line 451, in setup_context
alice_1 | return await task
alice_1 | File "/usr/local/lib/python3.9/asyncio/futures.py", line 284, in __await__
alice_1 | yield self # This tells Task to wait for completion.
alice_1 | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
alice_1 | future.result()
alice_1 | File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result
alice_1 | raise self._exception
alice_1 | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
alice_1 | result = coro.send(None)
alice_1 | File "/usr/src/app/aries_cloudagent/protocols/out_of_band/v1_0/routes.py", line 296, in invitation_receive
alice_1 | result = await oob_mgr.receive_invitation(
alice_1 | File "/usr/src/app/aries_cloudagent/protocols/out_of_band/v1_0/manager.py", line 509, in receive_invitation
alice_1 | oob_record = await self._perform_handshake(
alice_1 | File "/usr/src/app/aries_cloudagent/protocols/out_of_band/v1_0/manager.py", line 883, in _perform_handshake
alice_1 | raise OutOfBandManagerError(
alice_1 | aries_cloudagent.protocols.out_of_band.v1_0.manager.OutOfBandManagerError: Unable to create connection. Could not perform handshake using any of the handshake_protocols (supported [None])
The updates for the DID Exchange version were inadvertently breaking. I'll have to look deeper. My first pass was not much more than a copy-paste from OOB.
I've spent a fair amount of time sifting through the behavior of ACA-Py as it relates to handling messages from protocol versions different than the explicitly supported versions.
What should happen, according to RFC 0003, is that if a message with a minor version greater than what is supported is received, we handle the message still and respond anyways with our lower protocol version. Through this, the response recipient knows that there is a version mismatch and that some fields may have been ignored.
For example, if we get didexchange/1.1/request
but we only support didechange/1.0/request
explicitly, we will do our best to handle the message and respond with didexchange/1.0/response
.
As it turns out, recent past releases of ACA-Py emit this error on receipt of a didexchange/1.1/request
:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
result = coro.send(None)
File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/core/dispatcher.py", line 142, in handle_message
(message, warning) = await self.make_message(
File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/core/dispatcher.py", line 293, in make_message
message_cls = registry.resolve_message_class(message_type)
File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/core/protocol_registry.py", line 243, in resolve_message_class
return ClassLoader.load_class(msg_cls)
File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/utils/classloader.py", line 98, in load_class
if "." in class_name:
TypeError: argument of type 'NoneType' is not iterable
This essentially means that a new ACA-Py release would be incompatible with past versions of ACA-Py when using DID Exchange.
I could adjust our behavior as a newer version of ACA-Py to attempt to respond with 1.0
still but this, I think, would technically be out of spec, on top of being somewhat of a pain. Realistically, breaking backwards compatibility would probably cause more pain though.
Thoughts? @swcurran @ianco
I’m confused. Would we not want to add support for 1.1 and handle both 1.0 and 1.1? Presumably we want to support did_rotate~attach
? The 003 RFC says that is the ideal — respond with the same version as received.
Within a given major version of a protocol, an agent should:
- Respond to a minimum supported minor version, defaulting to "0".
- An agent SHOULD keep minimum supported minor version at "0" unless it is unsecure or extremely complicated to do so.
- Respond with or initiate a protocol instance the current fully supported minor version.
The way I interpret that second point from the semver rules, it seems to suggest that we should always respond with our "most current" minor version. By doing so, we make it clear to the other party that they may experience feature degradation or minor version mismatch since the language describing sending a problem report with a warning has been removed from the protocol:
Prior wording of this protocol included the following suggestion that is now considered deprecated. See note below about deprecating the "warning" problem reports.
In addition to responding with the protocol message (if necessary), the agent MAY also want to send a warning problem-report message with code version-with-degraded-features.
And:
Prior wording of this protocol included the following that is now considered deprecated. See note below about deprecating the "warning" problem reports.
In addition to responding with the protocol message, an agent MAY send a warning problem-report message with code fields-ignored-due-to-version-mismatch
Note: The deprecation of the "warning" problem-reports in cases of minor version mismatches is because the recipient of the response can detect the mismatch by looking at the PIURI, making the "warning" unnecessary, and because the problem-report message may be received after (and definitely at a different time than) the response message, and so the warning is of very little value to the recipient. Recipients should still be aware that minor version mismatch warning problem-report messages may be received and handle them appropriately, likely by quietly ignoring them.
I might be splitting hairs :sweat_smile: I think the ideal would be to only "implement" a single minor version of any major version of a protocol. Honestly wondering if DID Exchange 1.1 should have been DID Exchange 2.0 because, though did_rotate is technically optional, it will break handling of the messages still for did_doc~attach to be absent.
Sorry — I’m not understanding the issue. Probably we need to discuss. It would be good to get a clear outline of what is happening, what could be done easily and what you recommend we do.
- ACA-Py current minor version is…
- Should we change that?
- Receive DID Exchange 1.0 request
- ACA-Py currently …
- ACA-Py should be change to ...
- Receive DID Exchange 1.1 request
- ACA-Py currently…
- ACA-Py should be changed to …
I think the ideal is that ACA-Py be changed to have a current minor version of 1.1 and handle both 1.0 and 1.1 requests. However, I gather that is problematic…
Thanks!
Yeah, okay, allow me to slow down and take a deep breath for a moment lol.
The ideal: ACA-Py always just sends didexchange/1.1 since that reduces complexity. However, this would break backwards compatibility. Theoretically, the way DIDComm protocol versions should be handled, past versions of ACA-Py should have been able to work with 1.1; but, due to a bug in past versions, it cannot handle versions of the protocol greater than its "current minor version" (1.0).
(As much as it chagrins me to have explicit different paths for the same major version of a protocol rather than being able to just ignore some attributes) the right answer here is to support both 1.0 and 1.1, responding in kind to requests. I believe it should be possible to include both didexchange/1.0
and didexchange/1.1
in the OOB handshake_protocol list.
My biggest concern is just managing complexity in the DID Exchange protocol implementation. It's already quite dense. I'll continue to work to find a balance between just getting the dang feature implemented and not doing things that will make me sad later lol.
After looking at @swcurran's clarification PR in Aries RFC, I realized that the DID Exchange version was not explicitly changed on its RFC. Some time ago it was agreed to do so (see https://github.com/hyperledger/aries-rfcs/pull/795#issuecomment-1747033885) and we reflected that into Credo implementation, but now I'm not sure if it's clear for everyone.
I think it's the right thing to do although this makes things harder to implement in ACA-Py due to the aforementioned bug in previous versions. BTW, do you think if as a workaround (until unqualified dids support is completely removed) it could be possible to use discover-features to query for DID Exchange support and, based on that, send 1.1 or 1.0 requests? I guess it would be tricky, since a connection is not established yet, but maybe possible somehow?
After looking at @swcurran's clarification PR in Aries RFC, I realized that the DID Exchange version was not explicitly changed on its RFC.
https://github.com/hyperledger/aries-rfcs/blob/main/features/0023-did-exchange/README.md#11----signed-rotations-without-did-documents
It looks like the version bump was made but that just the title and metadata didn't get updated. Or at least that's how I read the intent and my memory of the discussions.
BTW, do you think if as a workaround (until unqualified dids support is completely removed) it could be possible to use discover-features to query for DID Exchange support and, based on that, send 1.1 or 1.0 requests? I guess it would be tricky, since a connection is not established yet, but maybe possible somehow?
We've got a few entry points to a DID Exchange protocol:
- We create an OOB invitation with the handshake protocol
- Received OOB invitation with Handshake protocol declared
- We receive a DID Exchange request through a public DID
- We send a DID Exchange request through a public DID
When we create an OOB invitation, we can include both didexchage/1.0 and didexchange/1.1 in the protocol list. We'll receive a request from them with one version or the other and then proceed from that point with the same version they sent.
When we receive an OOB invitation, if didexchange/1.1 is in the protocol list, we'll use it. Otherwise, we'll go to 1.0.
When we receive a DID Exchange request through a public DID, we will just match the version received in the request.
When we send a DID Exchange request, we could potentially do a discover features protocol with the public DID (I don't think ACA-Py technically supports this right now since there's no connection yet)... But to this point, we've left it up to the controller to orchestrate the flow of multiple protocols. So continuing that pattern, I think we'd add a switch to the request through public DID Admin API endpoint to give the controller the ability to pick which version of the DID Exchange protocol to send. We can default to 1.0.
A lot of progress on this PR. I think we're where we wanted to be implementation wise now. I just need to update the tests that are failing and do some regression checks against Credo to make sure that's still working, too.
@jamshale @amanji @ianco — nice to get some eyes on this one. If we can get this merged, we can get an rc3 out, and an official 0.12.0. That would be good!
Unit tests fixed now. Might be room for some additions to the tests. I've tested this branch against Credo 0.5.0 and tests passed so looking good on that front. I'll run some tests against previous ACA-Py versions.
Diagnosing an issue:
acapy_1 | Traceback (most recent call last):
acapy_1 | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
acapy_1 | result = coro.send(None)
acapy_1 | File "/usr/src/app/aries_cloudagent/core/dispatcher.py", line 210, in handle_message
acapy_1 | await handler(context, responder)
acapy_1 | File "/usr/src/app/aries_cloudagent/protocols/didexchange/v1_0/handlers/complete_handler.py", line 29, in handle
acapy_1 | await mgr.accept_complete(context.message, context.message_receipt)
acapy_1 | File "/usr/src/app/aries_cloudagent/protocols/didexchange/v1_0/manager.py", line 1064, in accept_complete
acapy_1 | conn_rec.my_did = await self.long_did_peer_4_to_short(conn_rec.my_did)
acapy_1 | File "/usr/src/app/aries_cloudagent/connections/base_manager.py", line 115, in long_did_peer_4_to_short
acapy_1 | await wallet.store_did(did_info)
acapy_1 | File "/usr/src/app/aries_cloudagent/wallet/askar.py", line 279, in store_did
acapy_1 | raise WalletDuplicateError("DID already present in wallet")
acapy_1 | aries_cloudagent.wallet.error.WalletDuplicateError: DID already present in wallet
@ianco I seem to be running amok of the reusing did:peer:4 changes. My tests work if I make sure to set the create_unique_did
query param on OOB invite creation. Otherwise, it would reuse the same DID for subsequent connections. Is this the intended behavior? It seems like inverting the create_unique_did
so it's always true would more closely mimic the original behavior of the OOB create invite endpoint.
@ianco I seem to be running amok of the reusing did:peer:4 changes. My tests work if I make sure to set the
create_unique_did
query param on OOB invite creation. Otherwise, it would reuse the same DID for subsequent connections. Is this the intended behavior? It seems like inverting thecreate_unique_did
so it's always true would more closely mimic the original behavior of the OOB create invite endpoint.
That was based on this comment from @swcurran : https://github.com/hyperledger/aries-cloudagent-python/issues/2703#issuecomment-1964411524
I don't have an opinion on what should be the default on the create_unique_did
attribute ...
(Connection reuse won't work unless we use the same DID on each invitation ...)
Diagnosing an issue:
acapy_1 | Traceback (most recent call last): acapy_1 | File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step acapy_1 | result = coro.send(None) acapy_1 | File "/usr/src/app/aries_cloudagent/core/dispatcher.py", line 210, in handle_message acapy_1 | await handler(context, responder) acapy_1 | File "/usr/src/app/aries_cloudagent/protocols/didexchange/v1_0/handlers/complete_handler.py", line 29, in handle acapy_1 | await mgr.accept_complete(context.message, context.message_receipt) acapy_1 | File "/usr/src/app/aries_cloudagent/protocols/didexchange/v1_0/manager.py", line 1064, in accept_complete acapy_1 | conn_rec.my_did = await self.long_did_peer_4_to_short(conn_rec.my_did) acapy_1 | File "/usr/src/app/aries_cloudagent/connections/base_manager.py", line 115, in long_did_peer_4_to_short acapy_1 | await wallet.store_did(did_info) acapy_1 | File "/usr/src/app/aries_cloudagent/wallet/askar.py", line 279, in store_did acapy_1 | raise WalletDuplicateError("DID already present in wallet") acapy_1 | aries_cloudagent.wallet.error.WalletDuplicateError: DID already present in wallet
Is this happening in an integration test?
Possibly aca-py should just check before adding a DID, if the DID is already present ...
To encourage connection reuse where possible, we want an Agent to use the same DID for invitations. The recommendation is that ACA-Py should default to enabling reuse, and a controller must go out of its way to force a new connection for every invitation. Does that make sense and answer the question?
Okay, I think I'm detecting where my assumptions have broken down. To restate, we want to encourage connection reuse by using the same DID for invitations; then, in the process of connecting, the invite creator should rotate out for a new did:peer:4 DID.
Sound right? I think some of my changes are erroneously assuming that a DID associated with the connection means we don't need to create a new one. I believe this was the case before the reuse PR. But with this new condition, I'll have to adjust my logic to make sure the new DID is rotated in for the connection.
Yes — that’s what I think we want. Use the same did:2/4 for every invitation, enabling reuse, and rotate to a new DID when connecting.
And in the super edge case where an Agent wants to actively prevent connection reuse, they have that option to use a new DID on every invitation and (presumably) still rotate to a new DID when connecting.
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
No data about Coverage
No data about Duplication
(Please note my updated summary of changes in the opening comment of this PR)
As a result of these changes, in combination with previously merged PRs, backwards compatibility is influenced by the following factors:
Past == ACA-Py 0.11.0 New == ACA-Py 0.12.0 or ACA-Py with the changes from main + this PR
- New can NOT set
--emit-did-peer-4
or--emit-did-peer-2
and initiate a connection via invitation since this would result in OOB invitations with adid:peer:4
did as a service which is unresolvable to Past.- Therefore, New cannot initiate via invitation DID Exchange 1.1 and did:peer:4 with agents supporting it AND interoperate with Past at the same time.
- New can accept invitations from Past and respond in kind
- New can accept requests from Past and respond in kind
- New can accept invitations from New (or equivalent featured agent) and respond in kind (e.g OOB invite with HS didexchange 1.1) but cannot create a did:peer:2/4 without the flags.
- Therefore, New cannot respond to an invitation from New with did:peer:2/4 AND interoperate with Past at the same time.
- New can accept requests from New (or equivalent featured agent) and respond in kind (with did:peer:2/4 regardless of flag state)
Are these interop constraints we're happy with or should we revisit what triggers the OOB behavior?
w00t!!! Thanks, @dbluhm — awesome work!