firecracker-containerd icon indicating copy to clipboard operation
firecracker-containerd copied to clipboard

Propagate firecracker-containerd specific errors to clients

Open sipsma opened this issue 4 years ago • 7 comments

Currently, clients using the fc-control client are able to receive grpc error codes in error objects returned by API calls, but those are very generic error types that don't communicate more specific information back to the caller. More specific information is at minimum extremely useful for integration testing (allows you to assert the correct failure occurred), but is also potentially valuable for real users who want to know what specifically went wrong for specific error-handling.

I spent a day trying various ways of implementing this as part of the drive mount support (where we want to assert on specific errors occuring in integ tests), but all the ways I tried either didn't work or became too large an effort:

  1. I tried updating APIs to return an actual response object (instead of Empty) that contained a field with an error type enum. The enum field would be set to a specific error code whenever a service returned a non-nil error. This was a nice interface for both client and server, but didn't work in practice because ttrpc (and possible grpc too, have not checked) will ignore any response object returned by a service if the service also returns a non-nil error (setting the response to nil). This makes the interface much uglier and difficult to manage correctly for both services and clients.
  2. grpc status objects contain a Details field, which seems to be potentially useful for returning typed details of what error occurred. However, the interface is hard to use as it requires the details be protobuf messages (not just strings) and adding details to a status object itself can return an error (so we can get an error trying to generate an error object)... This may be viable but might be a large effort.
  3. We can include specific error codes in the grpc status object Message field, but this leaves clients to check error types via string matching (which is very fragile). We could also consider structuring of messages (with associate serialization and deserialization), though that's a large effort and feels a bit odd because we'd be doing ser/der within our already ser/der'd protobuf messages.

Right now, I think figuring out how to use the Details field of the grpc status object may be the most promising, but this needs more investigation.

sipsma avatar Oct 17 '19 19:10 sipsma

Here are some generic error protos that are meant to go into that details field.

IRCody avatar Oct 17 '19 21:10 IRCody

Here are some generic error protos that are meant to go into that details field.

Ah interesting, I didn't know those existed! Yeah maybe we could look into re-using those or maybe writing our own message with an error type enum (similar to what I described in "1." in the original post but w/out putting them in a service response). What matters at the end of the day is that we can write some easy-to-use util library for including those in errors (server-side) and parsing them from errors (client-side).

As long as we can reasonably deal with the issue where WithDetails can itself return an error, this may be a good option.

sipsma avatar Oct 17 '19 21:10 sipsma

It looks like that error happens only if the value for the status is OK.

IRCody avatar Oct 17 '19 21:10 IRCody

It looks like that error happens only if the value for the status is OK.

I think it can here too unless it's the case that MarshalAny will always return a nil error.

It's entirely possible that we can deal with tihs by just not including Details in the case where an error is returned. Provided the cases where an err is returned are very obscure or can be fully tested, then that should be fine IMO.

sipsma avatar Oct 17 '19 21:10 sipsma

I'm just blind and missed that somehow.

I think that makes sense though.

re: sending actual responses instead of empty responses: Is this something we might want to do anyways so we can add values here in the future in a backwards compatible way?

IRCody avatar Oct 17 '19 21:10 IRCody

re: sending actual responses instead of empty responses: Is this something we might want to do anyways so we can add values here in the future in a backwards compatible way?

Yeah that's true, we just can't rely on them for communicating extra information when an error occurs, but besides that it's probably good to have our own response types even if most of them are empty at first. That's probably a separate issue from this one but definitely something to consider as we update all this.

sipsma avatar Oct 18 '19 17:10 sipsma

Yea I just thought of it when looking at this. It's not directly tied to this issue.

IRCody avatar Oct 18 '19 23:10 IRCody