Fix: Problem Report webhook - tails file upload failure when creating cred_def
- resolve #1743
- @PaulWen This will dispatch a webhook [topic:
acapy::problem_report] with the payload containingcred_def_id[pthid],revoc_reg_id[thid] and the reason. I think you expected this information to be presented in eitherPOST /credential-definitionsor/topic/revocation_registrywebhook. WithinPOST /credential-definitions, the cred_def is written to the wallet and ledger but the upload of the tails file gets done separately as a registered event [EventBus], so it is difficult to show the error here. With/topic/revocation_registrywebhook, based upon what I found, it is only triggered for [init,generated,posted,active] states. - Regarding ACA-Py logs, this was already implemented and you should have seen something like:
2022-04-27 21:04:45,521 aries_cloudagent.core.event_bus ERROR Error occurred while processing event
Traceback (most recent call last):
File "/home/indy/aries_cloudagent/core/event_bus.py", line 120, in notify
await processor()
File "/home/indy/aries_cloudagent/revocation/routes.py", line 1274, in on_revocation_tails_file_event
raise RevocationError(err_msg)
aries_cloudagent.revocation.error.RevocationError: Tails file for rev reg W1dd41kbhMNopR5h38PWsm:4:W1dd41kbhMNopR5h38PWsm:3:CL:8:test205:CL_ACCUM:1d809179-3108-4f53-ba1b-a8c57a87d14c failed to upload: Exceeded maximum put attempts
@shaangill025 Thank you very much for the investigation and quick solution! I must have overlooked the log message and will double-check on that. Also, I will try out the webhook-based error notification. Is it /topic/problem_report or /topic/acapy::problem_report?
Regarding the states of the revocation registry: Isn't it a little bit confusing that the state is set to active before the tails file is uploaded to the tails server? Would it be possible to add an intermediate state or only set it to active once the tails file was successfully uploaded?
Is it
/topic/problem_reportor/topic/acapy::problem_report?
/topic/problem_report
I double-checked if I can see any log message similar to Tails file for rev reg [...] failed to upload and it is being logged.
I just tested it and it works great! I am receiving the error messages via /topic/problem_report
Would it also be possible to send a message to /topic/revocation_registry/ maybe with state: error and errorMsg: Tails file for rev reg [...] failed to upload similar how it is also done for present_proof or issue_credential?
This could have the advantage that also cred_def_id and record_id could be provided in a structured way making it easier to parse and process.
But I would also be happy with the current implementation! :)
I'd be also be interested in what we do about the problem after reporting it. If we fail to create a rev_reg, can the cred_def be used at all? What does a recovery look like? Since ACA-Py "hides" the details of creating rev-regs when needed, what happens next?
Some ideas:
- Needed: Float the error up as loudly as possible 📢 so that an administrator looks for what the problem is. Almost certainly, we need to let people know they have to watch for this error and fix the underlying cause.
- Option: Continue to retry until solved -- however long it takes. However, how does this work across restarts?
- How do we deal with a request that must be handled by one instance (can't have multiple instances doing this), but where one process taking the task may not be able to complete the task?
- Option: Make the tails file service robust enough to not worry about this. Seems like a bad idea to rely on that.
- Option: Retry for a bit and then create the tails file locally. Requires the ACA-Py service being the backup tails file server -- including configured for all the features of being a web service. But the whole point of a tails service is not to be that.
- Option: Configure multiple tails servers so that if one is down, others are available.
- Other ideas?
The latter ideas require publishing the tails file BEFORE writing to the ledger.
It is definitely a tough one.. If I am not mistaken AcaPy will already perform multiple attempts to upload the file before giving up. Therefore, I guess in this situation there will always be a human intervention required (e.g. to fix the tails server).
My idea would be to require that the creation of a new RevReg is triggered manually via the API. Further, if no other RevReg is available anymore an error should be returned when trying to issue another credential.
Closing this as stale. Reference the issue #1743 to reimplement as needed.