aries-cloudagent-python icon indicating copy to clipboard operation
aries-cloudagent-python copied to clipboard

Combination of terse web hooks without --preserve-exchange-records makes completing a present proof 2.0 impossible

Open Gavinok opened this issue 1 year ago • 26 comments

As the verifier it is impossible to complete a present proof 2.0 transaction without the --preserve-exchange-records flag.

When the webhook confirming a proof request has been completed acapy will already delete the record before the controller can retrieve it.

Gavinok avatar Jun 25 '24 15:06 Gavinok

To confirm — the issue is not that ACA-Py has already deleted the record before the controller can retrieve it — that is expected/intended behaviour. The controller shouldn’t be trying to retrieve the record after the end of the protocol. Rather the issue is that the “terse" webhook that was added at some point into the code base (for good reason — it carries too much information) does not include the required information for the verifier — the presentation — that is needed by the controller.

Do I have that right?

swcurran avatar Jun 25 '24 16:06 swcurran

Can we replicate this with the demo? If so what are the steps?

I removed the preserve-exchange-records flag from the faber and alice demo and could still complete the proof despite the presentation record getting deleted by both agents.

jamshale avatar Jun 25 '24 17:06 jamshale

You just need to check the webhook to see what data comes back in it. If it is insufficient, it is a problem. The timing issue is not the problem. The controller/verifier should NOT be trying to retrieve the record via the API when that flag is not set.

swcurran avatar Jun 25 '24 17:06 swcurran

@swcurran that is correct. @jamshale this didn't pop up in the demo. It is using connections vs our current work with vc-authn is using OOB. That said I don't think that is the source of the problem.

I see that the response to the verify-presentation endpoint does contain the presentation. We could use this instead of the recommended process of retrieving the presentation after the the state changes to done.

Gavinok avatar Jun 25 '24 19:06 Gavinok

The issue is that in our case we don't even seem to manage to verify the presentation before it is deleted. Her are some of our logs

controller-1     | 2024-06-25T19:34:52.470174Z [info     ] GOT A PRESENTATION, TIME TO VERIFY [api.routers.acapy_handler]
controller-1     | 2024-06-25T19:34:52.470306Z [debug    ] >>> verify_presentation        [api.core.acapy.client]
controller-1     | Starting new HTTP connection (1): aca-py:8077
aca-py-1         | 2024-06-25 19:34:52,475 aries_cloudagent.admin.server ERROR Handler error with exception: Record not found: pres_ex_v20/714760ae-3c25-40b3-8ae9-a6413fc9c431.
aca-py-1         | 
aca-py-1         | =================
aca-py-1         | Traceback (most recent call last):
aca-py-1         |   File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/protocols/present_proof/v2_0/routes.py", line 1270, in present_proof_verify_presentation
aca-py-1         |     pres_ex_record = await V20PresExRecord.retrieve_by_id(session, pres_ex_id)
aca-py-1         |   File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/messaging/models/base_record.py", line 230, in retrieve_by_id
aca-py-1         |     result = await storage.get_record(
aca-py-1         |   File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/storage/askar.py", line 92, in get_record
aca-py-1         |     raise StorageNotFoundError(f"Record not found: {record_type}/{record_id}")
aca-py-1         | aries_cloudagent.storage.error.StorageNotFoundError: Record not found: pres_ex_v20/714760ae-3c25-40b3-8ae9-a6413fc9c431
aca-py-1         | 
aca-py-1         | The above exception was the direct cause of the following exception:
aca-py-1         | 
aca-py-1         | Traceback (most recent call last):
aca-py-1         |   File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/admin/server.py", line 181, in ready_middleware
aca-py-1         |     return await handler(request)
aca-py-1         |   File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/admin/server.py", line 218, in debug_middleware
aca-py-1         |     return await handler(request)
aca-py-1         |   File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/admin/server.py", line 451, in setup_context
aca-py-1         |     return await task
aca-py-1         |   File "/usr/local/lib/python3.9/asyncio/futures.py", line 284, in __await__
aca-py-1         |     yield self  # This tells Task to wait for completion.
aca-py-1         |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
aca-py-1         |     future.result()
aca-py-1         |   File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result
aca-py-1         |     raise self._exception
aca-py-1         |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
aca-py-1         |     result = coro.send(None)
aca-py-1         |   File "/home/aries/.local/lib/python3.9/site-packages/aiohttp_apispec/middlewares.py", line 45, in validation_middleware
aca-py-1         |     return await handler(request)
aca-py-1         |   File "/home/aries/.local/lib/python3.9/site-packages/aries_cloudagent/protocols/present_proof/v2_0/routes.py", line 1272, in present_proof_verify_presentation
aca-py-1         |     raise web.HTTPNotFound(reason=err.roll_up) from err
aca-py-1         | aiohttp.web_exceptions.HTTPNotFound: Record not found: pres_ex_v20/714760ae-3c25-40b3-8ae9-a6413fc9c431.
aca-py-1         | 2024-06-25 19:34:52,476 aiohttp.access INFO 172.24.0.6 [25/Jun/2024:19:34:52 +0000] "POST /present-proof-2.0/records/714760ae-3c25-40b3-8ae9-a6413fc9c431/verify-presentation HTTP/1.1" 404 288 "-" "python-requests/2.32.3"

It is surprising that this doesn't happen with the demo

Gavinok avatar Jun 25 '24 19:06 Gavinok

Are you saying that ACA-Py doesn’t verify the presentation, or that the controller doesn’t? We need to be pretty clear on that. AFAIK:

  • the presentation should arrive as a message.
  • ACA-Py should invoke the presentation verification code.
  • ACA-Py should post the webhook to the controller with all information needed by the controller.
  • The controller should use the webhook data for all post-verification processing, saving, etc.
  • ACA-Py should (at its leisure) delete the protocol state record, since the controller should never retrieve it.
  • The controller shouldn’t ever attempt to retrieve the protocol state record.

That flow MUST be used when the deployment does not set the —preserve-exchange-records flag, and SHOULD be the flow when it does set that flag.

swcurran avatar Jun 25 '24 19:06 swcurran

Sorry I should clarify.

As of now the demo controller simply uses the verify-presentation end point and waits for the web hook to indicate that verification has completed. It does not investigate what the presentation actually contains.

The response to verify-presentation did seem to contain a presentation. This however seems to regularly fail due to the record being deleted before verification has even completed. There seems to be a race condition here since the controller is sometimes able to verify the presentation.

The web hook never contains the presentation it's self.

  • ACA-Py should invoke the presentation verification code.

This is likely the case and the reason that when the controller manually verify the record is already deleted leading to the above stack trace.

Gavinok avatar Jun 25 '24 20:06 Gavinok

Hmm. As far as I could tell the webhook response always contained the required presentation result in the demo at least. There's some code here https://github.com/hyperledger/aries-cloudagent-python/blob/main/aries_cloudagent/protocols/present_proof/v2_0/models/pres_exchange.py#L205 related to the generated webhook payload. I was actually looking at it, because the debug.webhooks config being false actually seems to not remove the extra data which I found strange.

The above logs look like the controller is using the rest api..?

jamshale avatar Jun 25 '24 20:06 jamshale

Hmm strange, I will run the demo to compare. Maybe there is some code in the demo I am missing. As of now I do see that class being returned but no presentation is actually included. These are the different web hook results coming from vc-authn

web hook: {'role': 'verifier', 'initiator': 'self', 'auto_present': False, 'auto_verify': True, 'thread_id': '584ced18-8719-4523-a516-221faaea000c', 'state': 'request-sent', 'trace': False, 'created_at': '2024-06-25T20:53:59.148532Z', 'updated_at': '2024-06-25T20:53:59.148532Z', 'pres_ex_id': 'b3fb345b-afa4-474f-8427-baf638c0e578'} [api.routers.acapy_handler]
web hook: {'role': 'verifier', 'initiator': 'self', 'auto_present': False, 'auto_verify': True, 'thread_id': '584ced18-8719-4523-a516-221faaea000c', 'state': 'presentation-received', 'trace': False, 'created_at': '2024-06-25T20:53:59.148532Z', 'updated_at': '2024-06-25T20:54:08.539665Z', 'pres_ex_id': 'b3fb345b-afa4-474f-8427-baf638c0e578'} [api.routers.acapy_handler]
web hook: {'role': 'verifier', 'initiator': 'self', 'auto_present': False, 'auto_verify': True, 'thread_id': '584ced18-8719-4523-a516-221faaea000c', 'state': 'deleted', 'trace': False, 'verified': 'true', 'verified_msgs': [], 'created_at': '2024-06-25T20:53:59.148532Z', 'updated_at': '2024-06-25T20:54:13.342049Z', 'pres_ex_id': 'b3fb345b-afa4-474f-8427-baf638c0e578'} [api.routers.acapy_handler]
web hook: {'role': 'verifier', 'initiator': 'self', 'auto_present': False, 'auto_verify': True, 'thread_id': '584ced18-8719-4523-a516-221faaea000c', 'state': 'done', 'trace': False, 'verified': 'true', 'verified_msgs': [], 'created_at': '2024-06-25T20:53:59.148532Z', 'updated_at': '2024-06-25T20:54:13.342049Z', 'pres_ex_id': 'b3fb345b-afa4-474f-8427-baf638c0e578'} [api.routers.acapy_handler]

Gavinok avatar Jun 25 '24 20:06 Gavinok

Let me know if I am missing something

Gavinok avatar Jun 25 '24 20:06 Gavinok

The above logs look like the controller is using the rest api..?

Ya we can ignore that error since as mentioned we want to make all the info in the webhook it's self so this stack trace was irrelevant.

Gavinok avatar Jun 25 '24 21:06 Gavinok

Maybe the vc-authn controller is parsing out the presentation somehow?

I'd check out the demo with debug logs and different configs and see if it's what you are expecting. Unless, I was doing something wrong, the webhook payload I was seeing contained a lot more information.

jamshale avatar Jun 25 '24 21:06 jamshale

Ok, I will give that a try

Gavinok avatar Jun 25 '24 21:06 Gavinok

I don’t like that there is an endpoint to verify the presentation. That should happen automagically on receipt. Is the presentation not verified until that is called? If that is the case, the protocol state object should not be removed until that endpoint is called.

swcurran avatar Jun 25 '24 21:06 swcurran

By default it is done automatically (at least with 2.0) then it is deleted. Manual verification doesn't seem to be necessary if you have auto verification set or leave it unassigned.

Gavinok avatar Jun 25 '24 22:06 Gavinok

So, what I was doing with the webhook payload was my fault. The dockerfile for the demo had a random env variable setting debug-webhooks to true that overrides the command line argument. Gavin figured that out for me.

So, the payload without debug-webhooks basically causes automatic flows with webhooks to fail. Looks like we need to decide what exactly should be in the payload without the debug setting and what shouldn't.

jamshale avatar Jun 26 '24 18:06 jamshale

Good stuff. Agree — we need to fix the webhook. @Gavinok — can you take a pass at what you think needs to be in the payload for a random verifier?

swcurran avatar Jun 26 '24 18:06 swcurran

Certainly, I'm just digging through the debug web hook now. This may take some time to determine what fields are necessary with the 2.0 format

Gavinok avatar Jun 26 '24 18:06 Gavinok

I'm just looking into this now (I was away last week) but I don't think the presentation exchange record should be deleted before the presentation is verified, so maybe that's the solution, rather than tinkering with the webhook payload contents?

ianco avatar Jul 02 '24 18:07 ianco

The presentation exchange record is deleted upon sending or receiving an ACK:

https://github.com/hyperledger/aries-cloudagent-python/blob/main/aries_cloudagent/protocols/present_proof/v2_0/manager.py#L435

ianco avatar Jul 02 '24 18:07 ianco

I don’t like that there is an endpoint to verify the presentation. That should happen automagically on receipt. Is the presentation not verified until that is called? If that is the case, the protocol state object should not be removed until that endpoint is called.

There is a aca-py setting whether to auto-verify or not.

The record deletion looks like it happens when the ACK is sent (or received) and this is triggered by the verification.

So, if auto-verify is set, then the verification will happen automatically, the ACK will be sent and the record deleted.

If auto-verify is not set, then the verify-endpoint must be called, at which point the ACK will be sent and the record deleted.

So in any case, the record isn't deleted until after verification.

ianco avatar Jul 02 '24 18:07 ianco

The webhook is emitted automatically, when the pres_exch record is saved, here:

https://github.com/hyperledger/aries-cloudagent-python/blob/main/aries_cloudagent/protocols/present_proof/v2_0/models/pres_exchange.py#L200

The contents of the "abbreviated" webhook are defined here:

https://github.com/hyperledger/aries-cloudagent-python/blob/main/aries_cloudagent/protocols/present_proof/v2_0/messages/pres_webhook.py

... so it looks relatively straightforward to add the presentation to the webhook, if this is what we want.

ianco avatar Jul 02 '24 18:07 ianco

I don’t understand why there would need to be controller involvement in the verification. If a presentation is received in response to a presentation request, I would think that ACA-Py would verify it and put into the web-hook the presentation and result. That should certainly be the case for --auto-verify, but I don’t know why there would be any other setting.

Given how you have described it, if --auto-verify is off, then definitely, the protocol state object should not be deleted until the controller calls the verification endpoint. But that would seem like a waste of time for everyone — especially the controller/user.

swcurran avatar Jul 02 '24 19:07 swcurran

The field that was needed in our case was the "by_format" field from the webhook when the state transition switches to "done" https://github.com/bcgov/vc-authn-oidc/blob/88d16bc669186d80c4986e2583a495575b989994/oidc-controller/api/routers/acapy_handler.py#L51

Gavinok avatar Jul 02 '24 19:07 Gavinok

So, what I was doing with the webhook payload was my fault. The dockerfile for the demo had a random env variable setting debug-webhooks to true that overrides the command line argument. Gavin figured that out for me.

So, the payload without debug-webhooks basically causes automatic flows with webhooks to fail. Looks like we need to decide what exactly should be in the payload without the debug setting and what shouldn't.

@jamshale how do you get the demo to use abbreviated webhooks? For me even when I comment out that setting I get the full webhooks in the events :-S

ianco avatar Jul 02 '24 20:07 ianco

@ianco This env variable was overriding the command line argument. https://github.com/hyperledger/aries-cloudagent-python/blob/main/docker/Dockerfile.demo#L7

And then I also commented out the command line argument https://github.com/hyperledger/aries-cloudagent-python/blob/main/demo/runners/support/agent.py#L588

jamshale avatar Jul 02 '24 21:07 jamshale