kubeapps
kubeapps copied to clipboard
make GetAvailablePackageSummaries more robust to potential plugin failures
there are cases where one plugin can fail when calling GetAvailablePackageSummaries (for example postgres issue for helm plugin, bug in carvel plugin for empty result...), and in that case, the GetAvailablePackageSummaries call itself fails.
it should be possible for the core layer to catch those errors and report them along with the results from other plugins, rather than fail altogether.
Thanks Dimitri. Yeah, I think ~~Greg~~ Antonio and I have discussed this at one point, and given the options between silently ignoring that plugin and returning results from the others, vs exposing the error directly, I'm keen to expose the error directly.
But what you're proposing is I think trying to get the best of both worlds, where we return the results from the plugins without errors and also somehow notify that there was an issue with some plugins? It's tricky to see what the status would be for such a call (did it succeed? what if the plugin error was temporary, so the call for the next page then includes some results while the first didn't etc.)
I need to think about it more, but I still think I'd prefer to return an error if there was an error retrieving the aggregated results. What do others think?
yes, the goal is just to provide more robustness. the GetAvailablePackageSummariesResponse already has several fields. it would be easy to add errors and from which plugin they are, and let the UI display both the summary and a error banner.
I need to think about it more, but I still think I'd prefer to return an error if there was an error retrieving the aggregated results. What do others think?
I have also brought this stuff up in the past, even if I'm 100% for returning any error back to the user I still think everything should fail just because of a tiny error in a single plugin.
Perhaps returning an ABORTED
status code (if the client should retry at a higher-level, such as when a client-specified test-and-set fails which indicates that the client should restart a read-modify-write sequence) instead of an OK, but at least return something.
For example, a real escanrio I faced recently: I installed Kubeapps and then upgraded it via helm plugin but I wanted to manage carvel packages. Due to some incompatibilities with my chart values, the helm plugin was failing. Well, so as a result, I wouldn't be able to neither manage the installed packages via helm nor installing new packages with carvel. I mean, of course, there certainly was something wrong with my kubeapps installation... but I would have expected to have been able to use kubeapps (even if in a degradated mode).
I need to think about it more, but I still think I'd prefer to return an error if there was an error retrieving the aggregated results. What do others think?
I have also brought this stuff up in the past, even if I'm 100% for returning any error back to the user I still think everything should fail just because of a tiny error in a single plugin.
I think you mean "should not" (from your argument below) :)
Perhaps returning an
ABORTED
status code (if the client should retry at a higher-level, such as when a client-specified test-and-set fails which indicates that the client should restart a read-modify-write sequence) instead of an OK, but at least return something.
Yep, aborted could be an option if it's possible to return that status code with a valid result, but I'm not sure it is with grpc? Either the call succeeds and returns your response message, or you instead get an error response. You can have richer errors for more details about the error itself, but I don't think it's normal to return something like "partially ok", though we could enable that in our own message format (though I'm not personally convinced).
For example, a real escanrio I faced recently: I installed Kubeapps and then upgraded it via helm plugin but I wanted to manage carvel packages. Due to some incompatibilities with my chart values, the helm plugin was failing. Well, so as a result, I wouldn't be able to neither manage the installed packages via helm nor installing new packages with carvel. I mean, of course, there certainly was something wrong with my kubeapps installation... but I would have expected to have been able to use kubeapps (even if in a degradated mode).
Yes, I think that's where I'd want to see a blocking error. There was a misconfiguration with your upgrade, and if you didn't see that because you still saw a catalog with carvel packages and went ahead, you might think everything is fine and start letting users use it. The correct action to take at that point would be either to disable the misbehaving plugin or fix the configuration, IMO, not use Kubeapps in a half-working state and getting different results for the same request.
That said, I'm happy to consider alternatives for tolerance with plugin failures if you both feel strongly about it. Worth considering the priority of this too (ie. seems like something that could affect users more when we have community plugins in the future, not necessarily yet)
If we could return results as well as an error code or information that got exposed in the client, that could be nice, but it's still
+1 to showing available results together with any plugin error when it failed (if that is technically possible with Grpc) If e.g. Carvel plugin fails, I would still like to see Helm plugin results. But at the same time I would like to be notified in UI (warning?) that there was something else wrong/failing. If only the error shows and no results, user might think that there is a general failure, which would not be the case in the example. A simile would be like when you order two items online, and only one gets to your door. You want it, but you also want to know what happened to the other one 😄
Perhaps we could dedicate some time to investigate a proper error framework in Grpc-web?
If we could return results as well as an error code or information that got exposed in the client, that could be nice, but it's still
Something missing here at the end of the sentence @absoludity ?
I think I just meant to delete "but it's still" (I had been going to say that it's still going to leave the results inconsistent, since the next page of results may then have some for the problem plugin, but it's not a big deal - having out of order results for a misconfigured/misbehaving plugin).
So I think a majority here agree that being able to return the results for the correctly behaving plugins while also ensuring the client is aware of an issue with certain plugins would be a good thing? So let's move towards that then.
Regarding the status, I think we may be restricted to returning a gRPC OK, but with a modified message that includes some extra info indicating which plugin is erroring? Let's see when someone gets to look more deeply.