rudr icon indicating copy to clipboard operation
rudr copied to clipboard

Health scope should provide APIs that return descriptive errors when deployments fail

Open suhuruli opened this issue 5 years ago • 3 comments

  • The health scope should have APIs that I can query after a deployment fails (and assuming the YAML was valid), that will describe to me what went wrong with the deployment.

  • For example, if I deploy an ApplicationConfiguration and there is an error, the Health Scope should be queryable and return information on which components failed to come up and some deeper information as to why.

  • kubectl describe healthscope and it returns a description of the scope and the problems within that scope

suhuruli avatar Nov 27 '19 16:11 suhuruli

I have given a PR(#473) to solve how we give errors back to the user. My answer is k8s events.

In that PR, I also give an example, when I deploy an appconfig with something wrong, I could use kubectl describe to see what's wrong.

$ kubectl describe cfg first-app
Name:         first-app
Kind:         ApplicationConfiguration
...
Spec:
  Components:
    Component Name:  helloworld-python-v1
    Instance Name:   first-app-helloworld-python-v1
...
Events:
  Type     Reason                                                                                     Age   From  Message
  ----     ------                                                                                     ----  ----  -------
  Warning  ApiError NotFound ("componentschematics.core.oam.dev \"helloworld-python-v1\" not found")  3s          creating AppConfig first-app error

Yeah, I know provide API from health scope is better, but I don't think this is a high priority issue.

wonderflow avatar Nov 28 '19 02:11 wonderflow

I am troubleshooting a similar issue but no events at this point with the alpha 1 version. Anyway that I can get the warning events you got from k8s today? Mine is empty. No deployments for this instance yet.

kubectl describe cfg bikesharing-app Name: bikesharing-app Namespace: default Labels: Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"core.oam.dev/v1alpha1","kind":"ApplicationConfiguration","metadata":{"annotations":{},"name":"bikesharing-app","namespace":... API Version: core.oam.dev/v1alpha1 Kind: ApplicationConfiguration Metadata: Creation Timestamp: 2020-01-07T18:05:21Z Generation: 1 Resource Version: 3149582 Self Link: /apis/core.oam.dev/v1alpha1/namespaces/default/applicationconfigurations/bikesharing-app UID: 45010f4e-3178-11ea-a462-7a73f2d3d989 Spec: Components: Component Name: bikesharing-ui-v1 Instance Name: bikesharing-ui Traits: Name: ingress Parameter Values: Name: hostname Value: bikesharing.com Name: path Value: / Component Name: bikesharing-email-api-v1 Instance Name: bikesharing-email-api Component Name: bikesharing-feedback-api-v1 Instance Name: bikesharing-feedback-api Component Name: bikesharing-profile-api-v1 Instance Name: bikesharing-profile-api Events:

sowsan avatar Jan 07 '20 18:01 sowsan

I think events needs to be deduped otherwise they expire?

Also it seems what added here https://github.com/oam-dev/rudr/pull/473/files wasn't a summary of what really goes wrong during the deployment. it's mroe like a cfg log event

zhxu2 avatar Jan 23 '20 20:01 zhxu2