jaeger-operator icon indicating copy to clipboard operation
jaeger-operator copied to clipboard

CR failures should be reported via status object

Open jkandasa opened this issue 5 years ago • 5 comments

This issue somehow related to #466, however this may need a different fix

Invalid CR files error should be reported to the user console. And should not keep on trying to deploy the CR file.

I changed the boolean value to string (eg: true >> "true") and integer to string (ex: 10 >> "10") deployed the CR file. on the console, it looks all ok. But on the operator-log keep on throwing an exception.

Expected behavior:

  • The error should be thrown on the user console [OR] the client side
  • AFAIK, There is no use to keep on trying to deploy the invalid CR file. Stop it on the first failure
  • I believe if we read everything as a string and convert it to the specific format may avoid most of the format issues.

No exception on the console.

$ oc create -f jaeger-services-final.yaml 
jaeger.jaegertracing.io/jaegerqe created
  storage:
    type: elasticsearch
    esIndexCleaner:
      enabled: true
      schedule: "*/10 * * * *"
      numberOfDays: "2"

operator log file: jaeger-operator-545c69455d-4clq5.log

CR file:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: "jaegerqe"
spec:
  ingress:
    security: none
  strategy: production
  collector:
    replicas: 1
    image: jaegertracing/jaeger-collector:1.12
    resources:
      requests:
        memory: "512Mi"
        cpu: "1"
      limits:
        memory: "512Mi"
        cpu: "1"
    options:
      log-level: info
      metrics-backend: prometheus
      collector:
        num-workers: 1
        queue-size: 20000
      es:
        bulk:
          size: 524288
          workers: 1
          flush-interval: 200ms
        tags-as-fields:
          all: false
  query:
    replicas: 1
    image: jaegertracing/jaeger-query:1.12
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"
    options:
      log-level: "info"
      metrics-backend: prometheus
      query:
        port: 16686
      es:
        timeout: 10s
  agent:
    strategy: sidecar
    image: jaegertracing/jaeger-agent:1.12
    resources:
      requests:
        cpu: "200m"
        memory: "128Mi"
      limits:
        cpu: "200m"
        memory: "128Mi"
    options:
      log-level: info
      metrics-backend: prometheus
      processor:
        jaeger-compact:
          server-queue-size: 10000
          workers: 10
  storage:
    type: elasticsearch
    esIndexCleaner:
      enabled: true
      schedule: "*/10 * * * *"
      numberOfDays: "2"      
    dependencies:
      enabled: false
    elasticsearch:
      image: quay.io/openshift/origin-logging-elasticsearch5:latest
      nodeCount: 3
      resources:

jkandasa avatar Jun 07 '19 06:06 jkandasa

But on the operator-log keep on throwing an exception.

What's the exception?

jpkrohling avatar Jun 20 '19 07:06 jpkrohling

@jpkrohling Operator log is attached on the initial comment. E0607 06:27:47.976474 1 reflector.go:125] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:93: Failed to list *v1.Jaeger: v1.JaegerList.Items: []v1.Jaeger: v1.Jaeger.Spec: v1.JaegerSpec.Storage: v1.JaegerStorageSpec.EsIndexCleaner: v1.JaegerEsIndexCleanerSpec.NumberOfDays: readUint64: unexpected character: �, error found in #10 byte of ...|rOfDays":"2","schedu|..., bigger context ...|,"esIndexCleaner":{"enabled":true,"numberOfDays":"2","schedule":"*/10 * * * *"},"type":"elasticsearc|...

jkandasa avatar Jun 20 '19 12:06 jkandasa

Sorry, I missed that. Whenever possible, let's try to keep the error messages in the issue itself, so that people searching for the error message can find this bug report.

jpkrohling avatar Jun 20 '19 12:06 jpkrohling

I changed the scope of this issue a bit, but the idea is the same: users creating a CR should be able to see the error messages that happened during the process. Storing the error in a status object allows it to be viewed via kubectl describe jaeger

jpkrohling avatar Jul 16 '19 12:07 jpkrohling

I recommend a Validating Webhook rather than admitting the jaegertracing.io/v1 Jaeger and then setting its Status.

The only time it is reasonable to set the Status is if the status can't be checked at admission time. (For example, a Jaeger can have a priorityClassName: collector-high-priority but at admission time the scheduling.k8s.io/v1 PriorityClass might not exist yet.). The only way to implement this is to watch for changes on the CR and every kind of object the CR might refer to, which is why this kind of checking is rarely implemented. (In Istio that kind of checking is opt-in because of performance problems.)

esnible avatar Mar 21 '22 16:03 esnible