aries-cloudagent-python
aries-cloudagent-python copied to clipboard
Add startup flag --light-weight-webhook to trim down outbound webhook payload
When we create a credential offer with image, the image is repeated in following objects (dict) "credential_request", "cred_request", "credential_proposal", "cred_proposal", "credential_offer", "cred_offer", "credential_preview", "cred_preview", "values", "credentials~attach", "offers~attach"
a Credential offer with 30KB image resulted in a webhook sized at 227KB in my use case and resulted HTTP 413 as well as unnecessary stress on WAF rule regex processes on those encrypted string and base64 image string.
This startup flag will remove those objects only on outbound webhook and size went down to 7KB at most. I did preserve 1 object "cred_issue" which has the "rev_reg_id" and "cred_rev_id".
To me this is a more like a nice to have flag if someone ever encounter problem with image and webhook.
Previous related discussion PR #725 Issue #1222
Signed-off-by: Victor Lee [email protected]
Interesting problem. Looking forward to folks reviewing this and the best approach to solving the very real issue. I don't know nearly enough to know what is the right way to handle this, so adding @ianco @dbluhm @frostyfrog @andrewwhitehead to the discussion.
It would be a huge change, but perhaps the webhooks should only send new information to the controller (vs. the full state) and leave it to the controller to hold the data it needs to make decisions as the protocol proceeds. Or do that, and add in an endpoint that says "send all state for protocol X" whenever it is needed by a controller?
Thanks for the PR!
I'll agree that this is an interesting problem and I'm certain it will only get worse when implementors don't take file size into consideration, especially if upper management requests higher resolution imagery. For example, an image from an iPhone that controls embedded 3D data for biometric systems.
This issue, in terms of credentials, is prevalent during not only credential issuance, but verification/proof and endorsement. There are also plugins implementing features (such as the Q and A protocol) which may also cause increased webhook payloads if the system chooses to accept binary or non-fixed width payloads (I'm sure some malicious user is just waiting to send a script from Shakespeare as an answer to "why are you visiting the country").
I believe that what you've got here is a great first step towards solving the issue, but I believe that there's a lot more that can be done overall. Making sure to decrease unnecessarily duplicated data is one thing. Limiting the size of the payload in the face of untrusted data is another. I recommend that, for lightweight payloads, we send the minimal amount of data needed to request the full data where possible. Such as a presentation exchange ID during proof and allowing the system to retrieve the full record manually if they need it. This may be considered a bit extreme, but since we can't quite trust that the third party agent won't send us something massive, I feel as though it's the best compromise to prevent errors.
P.S. I hope that's all readable. Woke up and sent it from my phone. Really interesting problem we've got here :)
Kudos, SonarCloud Quality Gate passed! 
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
Thanks for the comment and attention with great recommendations. I personally didn't like my approach either which I feel like I might removed something useful (although my invitation to offer to issue cred and revoke still works flawlessly, can't say about proofing yet ) and I am new to python, so this is the least destructive mod I can come up with before a deep dive.
Currently I only use webhook for state update and my workaround to get the full record is to manually fetch from rest API with cred_ex_id which give me the exact same thing back when needed.
Hi @swcurran @frostyfrog,
I have reverted my previous approach (which remove stuff from the webhook payload, this is destructive and not safe IMO).
I have make new changes that is cleaner and safe. A new LightWeightWebhook object is created that only target to use on V10CredentialExchange and V20CredExRecord (so all other webhooks are not affected)
The new LightWeightWebhook object contain only state update and essential information. (basically the 1st level of the state protocol object)
This is inline with the idea send the minimal amount of data needed to request the full data where possible.
here is one of the example with the flag on topic=issue_credential_v2_0, state=credential-issued
{
"auto_issue": false,
"auto_offer": false,
"connection_id": "0f75b936-1f0b-4db5-aa80-72e85cc08fce",
"cred_ex_id": "7b48e9c2-dd31-4205-87c6-fdbb10efaf17",
"initiator": "self",
"role": "issuer",
"state": "credential-issued",
"thread_id": "b4fd1123-d8ef-4eae-ade9-6ef7b3ea7af8"
}
@ianco could you take a look at this one? Seems valuable to me.
@ianco could you take a look at this one? Seems valuable to me.
ok will do.
Would it make sense to make this a breaking change with a “—useOldStyleCredExchangeProtocolStateObject” (or something equally weird) for backwards compatibility? It seems this is something we would rather have than not.
We are doing a 0.8.0 breaking release change Real Soon Now, so this would fit there.
I think the default is the old style verbose webhooks, no? so not a breaking change ... (unless you want the default to be the more terse format)
PS failing test: FAILED aries_cloudagent/messaging/models/light_webhook.py::FLAKE8
I’m suggesting that we want the default to be the terse form, and only if needed use the long form. AFAIK, all of the data is available in the long form — it’s just the extra copies of the data that are in the short form.
Put another way, we could be smarter about not repeating content in the Protocol State Object by blindly adding in each message, where the message repeats data from a previous message.
Or is that a bad idea? I guess things could be lost if the content changes across messages — then we would only have the latest version vs. all versions. How is that handled in this case? Is there checking that the content hasn’t changed between messages, and if so what is done?
I’m suggesting that we want the default to be the terse form, and only if needed use the long form.
I think this would be fine, however if we do this we should probably implement a terse form for the presentation exchange as well (this PR only addresses the cred exchange). (I think all the other exchanges are prob ok as is)
So maybe go with this now, and add an Issue to change the presentation model as well, and then change to terse by default in the next breaking release? In theory, we could get it in for the final 0.8.0.
I am on it now should be done shortly. I will put 2 separate classes for the v1 and v2 cred exch in message package.
When we have a plan to make it default to be the terse form, maybe in future release we can tied long form to other flag like --debug or ---trace.
sorry my bad, still getting use to black and flake8...ha...
@ianco @swcurran Thank you very much! I think next step is to create new issue on proof exchange webhook and terse form by default so that we can carry the discussion further.
Thanks @victorlee0505 — agree on that as the next step.
