fabric-samples Define best practices for handling errors in chaincode

Note: this is related to the REST sample review and recent discussions on error handling during Fabric Gateway development which have resulted in some changes to chaincode implementations, e.g. Java chaincode.

The asset transfer samples are currently the primary chaincode samples and they all fail with human readable errors if the asset exists when it shouldn't and vice versa. This kind of text error is not good practice and makes error handling in client applications more difficult. As if to prove the point, the samples use slightly different error messages in the different language implementations.

The underlaying interface allows chaincode implementations to return errors with an error code, error message, and error payload, however due to lack of support in the current chaincode and client SDK implementations, I think the error message is the only piece of information you can rely on.

Ignoring the current implementations, how should chaincode report errors, and "business logic" errors in particular?

Being able to use a domain specific error code, human readable message, and domain specific payload all seem potentially useful to me, e.g. error 2035 might indicate that an asset cannot be updated because it is currently being held for inspection, and a payload with the asset's current state, or inspection details could be useful.

One possibility is not to return application/business level "errors" as errors at all. In this scenario the response payload would include the application specific response code, plus any asset, etc. essentially layering another level of error handling on top. It would mean that the failure would be ordered and included on the blockchain for any transactions that are submitted instead of evaluated. That might be desirable for audit purposes in some situations, although it's worth noting that as far as I know there is no way to force transactions to be submitted for ordering. (Potentially client applications could decide whether or not to submit a transaction for ordering based on the response payload?)

Any comments, suggestions, or recommendations?!

See related core Fabric issue: hyperledger/fabric#3154

Nov 24 '21 10:11 jt-nti

It's a tricky area and you raise some excellent points. I don't have a clear answer but I'll add some thoughts that might help get closer to the right direction.

Returning good responses with some application-defined payload for business logic error conditions makes perfect sense for REST applications and works fine for the evaluate flow. For example, get details of an asset with a specified ID could return an empty result if the asset doesn't exist. But it makes less sense for Fabric in general since, as you rightly point out, a success response in the submit flow means successful endorsement and OK to go ahead and update the ledger.

The status codes are not very consistently handled or reported back from chaincode to client, and I suspect the behaviour is different in different SDK language implementations. HTTP status codes don't necessarily fit all application-level conditions, and Fabric status codes, while modelled on them, are not really HTTP status codes anyway. Today they tend to have some meaning to the Fabric runtime so I'm not sure whether assigning application-level meaning to them (certainly beyond standard Fabric and/or HTTP status codes) will work well.

gRPC errors provide another set of status codes, which are used by Fabric Gateway and its clients for Fabric / infrastructure errors, but they don't really fit arbitrary application-level error response conditions, and aren't exposed to chaincode or client applications through existing SDKs anyway.

The Response message structure payload field is documented as being for metadata associated with the response. Perhaps that is the most sensible place for application-level machine readable response information alongside the human readable message field.

To help guide the best solution, it would be really good to have a more concrete picture of specific scenarios where it is necessary to have a machine readable application-level error response though. What actual application-level failure conditions are there where the business application will interrogate the endorsement error details and then react differently depending on the error?

Nov 30 '21 16:11 bestbeforetoday

It might be helpful to try and classify the error scenarios from the perspective of the client application. Thinking along the lines of @bestbeforetoday final point.

Connectivity/Service Availability: something is offline, or unable to establish the networking connection Can try and reconnect, but more than likely won't help- time to raise a support ticket (somewhere)
Application-level error: from the chaincode something like 'asset not found' - 'you can't transfer that asset' (is this what you were referring to as 'endorsement error details' @bestbeforetoday ? At the very least, returning more information (eg in the payload) would be the minimum extensions would suggest... (note that the Java chaincode has the APIs to do this today - but as @jt-nti notes, it's thrown away somewhere)
Retryable Errors - these are the Fabric Specific errors primarily MVCC conflicts

Dec 02 '21 11:12 mbwhite

Here's is the code (that you can write today) in Java; that should be seen by the client as failure for the SmartContract to 'endorse/complete' for business reasons.

    @Transaction(intent = TYPE.SUBMIT)
    public void  payloadChaincodeException(final Context ctx){
        String payload = String.format("{\"error\":{\"code\":404,\"owner\":\"MrAnon\"}}");
        ChaincodeException cce = new ChaincodeException("[ErrorContract] Payload & Message",payload);

        throw cce;
    }

Dec 02 '21 11:12 mbwhite

It looks like the different asset transfer samples also have different approaches to error handling. For example, in places (ReadTransferAgreement) the asset transfer private data sample just logs an error and returns null if it cannot find the private data, with no indication for client apps as to why.

Dec 15 '21 11:12 jt-nti