triggers icon indicating copy to clipboard operation
triggers copied to clipboard

Eventlistener using wrong URL for clusterinterceptor

Open quant-daddy opened this issue 3 years ago • 8 comments

Expected Behavior

Event listener should use right address for interceptors. I have configured ClusterInterceptors to use https://tekton-triggers-core-interceptors.tekton-pipelines.svc:8443/cel

apiVersion: triggers.tekton.dev/v1alpha1
kind: ClusterInterceptor
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"triggers.tekton.dev/v1alpha1","kind":"ClusterInterceptor","metadata":{"annotations":{},"name":"cel"},"spec":{"clientConfig":{"service":{"name":"tekton-triggers-core-interceptors","namespace":"tekton-pipelines","path":"cel"}}}}
  creationTimestamp: "2022-06-07T19:52:29Z"
  generation: 4
  name: cel
  resourceVersion: "498224521"
  uid: 37c35519-10c6-4784-9eda-7d09b40d890a
spec:
  clientConfig:
    caBundle: <redcated>
    url: https://tekton-triggers-core-interceptors.tekton-pipelines.svc:8443/cel
status:
  address:
    url: https://tekton-triggers-core-interceptors.tekton-pipelines.svc:8443/cel

Actual Behavior

The listener may be the wrong URL or there is some issue with throttling?

el-listener-interceptor-79d9648974-hwshf event-listener I0607 22:39:07.736328       1 request.go:665] Waited for 1.194655471s due to client-side throttling, not priority and fairness, request: GET:https://10.100.0.1:443/apis/triggers.tekton.dev/v1alpha1/clusterinterceptors/cel

The service URL for tekton-triggers-core-interceptors service is below:

tekton-triggers-core-interceptors   ClusterIP   10.100.14.118   <none>        8443/TCP                             168m

Additional Info

  • Kubernetes version:

    Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:51:05Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8-gke.201", GitCommit:"2dca91e5224568a093c27d3589aa0a96fd3ddc9a", GitTreeState:"clean", BuildDate:"2022-05-11T18:39:02Z", GoVersion:"go1.16.14b7", Compiler:"gc", Platform:"linux/amd64"}
  • Tekton Pipeline version:

    Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

Client version: 0.23.1
Pipeline version: v0.36.0
Triggers version: v0.20.0
Dashboard version: v0.26.0

quant-daddy avatar Jun 07 '22 22:06 quant-daddy

I'm getting a similar issue with the same version:

EL log

{"level":"error","ts":"2022-06-17T09:58:36.185Z","logger":"eventlistener","caller":"sink/sink.go:381","msg":"Post \"https://tekton-triggers-core-interceptors.tekton-pipelines.svc:80/github\": dial tcp 10.4.2.204:80: i/o timeout","eventlistener":"default","namespace":"platform","/triggers-eventid":"76aa89a7-f421-4efb-8601-a3b12850dd09","eventlistenerUID":"7c4a390c-ee58-42bb-825f-5ce8c16147e6","/triggers-eventid":"76aa89a7-f421-4efb-8601-a3b12850dd09","/trigger":"infrastructure-utils-publish","stacktrace":"github.com/tektoncd/triggers/pkg/sink.Sink.processTrigger\n\tgithub.com/tektoncd/triggers/pkg/sink/sink.go:381\ngithub.com/tektoncd/triggers/pkg/sink.Sink.HandleEvent.func1\n\tgithub.com/tektoncd/triggers/pkg/sink/sink.go:196"}
NAME                                TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                              AGE
tekton-triggers-core-interceptors   ClusterIP   10.4.2.204    <none>        8443/TCP                             48d

It's trying to hit https://tekton-triggers-core-interceptors.tekton-pipelines.svc:80 when the service is configured for 8443. I tried manually changing the svc port (8443 -> 80) and got:

2022/06/17 10:06:43 http: TLS handshake error from 10.0.2.139:58702: remote error: tls: bad certificate

freefood89 avatar Jun 17 '22 10:06 freefood89

Actually, I fixed the ClusterInterceptor (both cel and github) config's spec.clientConfig.service.port to 443, modified the svc/tekton-triggers-core-interceptors port to 443 and restarted the EventListener to see if the certs get created and my issue was resolved

Maybe related to https://github.com/tektoncd/triggers/issues/1368

freefood89 avatar Jun 17 '22 10:06 freefood89

Hey @quant-daddy sorry for the late response Are you still facing this issue ?

If yes could you tell me the steps you have executed

Note: If you give

clientConfig:
    caBundle: <redcated>
    url: https://tekton-triggers-core-interceptors.tekton-pipelines.svc:8443/cel

url field in the clientConfig that means you have written your own https interceptor and caBundle indicates that you clusterinterceptor use that to verify connection

even while writing k8s service for clusterinterceptor you need to take care of ports part Here is the reference PR https://github.com/tektoncd/triggers/pull/1379 for the same

savitaashture avatar Jul 01 '22 14:07 savitaashture

I believe this issue is more related to https://github.com/tektoncd/triggers/issues/1284? @quant-daddy can you confirm that the IP in your URL is you API server? I'm also seeing this error in the 0.20.1 release of triggers. The issue got better after 0.19.0 but this message still show up and the timeouts still happen

joaosilva15 avatar Jul 04 '22 11:07 joaosilva15

@joaosilva15 Yes the URL was for the API server. I think this could be related to the API request throttling in newer version of kubernetes. With the introduction of ClusterInterceptor CRD, I think the event listener has to query the API server repeatedly for the data in the CRD for each event received. If we receive a lot of events (most of them not useful), it triggers the rate limit / throttling by the API server. I'm speaking from the little research I did when facing the issue few weeks back. To solve the issue, I temporarily disabled event emission for internal tekton events for task/pipeline runs but this is of course a temporary fix.

@savitaashture I can confirm that right port and URI was being used and the connection was successful.

quant-daddy avatar Jul 04 '22 22:07 quant-daddy

Hmm, yeah we might be making direct calls to the API server instead of going via the lister cache

dibyom avatar Jul 07 '22 16:07 dibyom

Hope it gets fixed soon! @dibyom Thanks

quant-daddy avatar Jul 11 '22 15:07 quant-daddy

Hi @quant-daddy

Could you try with latest v0.20.2 Triggers release

even after using v0.20.2 release still there is an issue please provide the steps to reproduce the issue

Thank you

savitaashture avatar Jul 28 '22 06:07 savitaashture

I am currently facing this issue. A fresh https://github.com/tektoncd/triggers/releases/tag/v0.20.2 triggers release is installed. I have a ClusterInterceptor installed like:

apiVersion: triggers.tekton.dev/v1alpha1
kind: ClusterInterceptor
metadata:
  creationTimestamp: "2022-08-17T02:26:29Z"
  generation: 2
  labels:
    server/type: https
  name: gitlab
  resourceVersion: "64651695"
  uid: 68d1afbd-54f6-4e13-8088-5f4462d99e69
spec:
  clientConfig:
    caBundle: <removed>
    service:
      name: tekton-triggers-core-interceptors
      namespace: tekton-pipelines
      path: gitlab
      port: 8443

A service like:

apiVersion: v1
kind: Service
metadata:
  annotations:
  creationTimestamp: "2022-08-17T19:26:39Z"
  labels:
    app: tekton-triggers-core-interceptors
    app.kubernetes.io/component: interceptors
    app.kubernetes.io/instance: default
    app.kubernetes.io/name: tekton-triggers-core-interceptors
    app.kubernetes.io/part-of: tekton-triggers
    app.kubernetes.io/version: v0.20.2
    triggers.tekton.dev/release: v0.20.2
    version: v0.20.2
  name: tekton-triggers-core-interceptors
  namespace: tekton-pipelines
  resourceVersion: "65222209"
  uid: 2f0f7659-3d52-495a-aed6-9adca0cef587
spec:
  clusterIP: 10.98.17.155
  clusterIPs:
  - 10.98.17.155
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: https
    port: 8443
    protocol: TCP
    targetPort: 8443
  selector:
    app.kubernetes.io/component: interceptors
    app.kubernetes.io/instance: default
    app.kubernetes.io/name: core-interceptors
    app.kubernetes.io/part-of: tekton-triggers
  sessionAffinity: None
  type: ClusterIP

An eventlistener like:

---
apiVersion: triggers.tekton.dev/v1beta1
kind: EventListener
metadata:
  name: gitlab-fedora-kickstart
  namespace: sway-sig
spec:
  serviceAccountName: tekton-triggers
  triggers:
    - name: gitlab-pipeline-events-trigger
      interceptors:
        - name: "verify-gitlab-payload"
          ref:
            name: "gitlab"
            kind: ClusterInterceptor
          params:
            - name: secretRef
              value:
                secretName: "gitlab-webhook"
                secretKey: "secretToken"
            - name: eventTypes
              value:
                - "Pipeline Hook"
        - name: "CEL filter: only when pipelines are sucessful on sway branch"
          ref:
            name: "cel"
          params:
          - name: "success"
            value: "body.object_attributes.status == 'success'"
          - name: "no-mr"
            value: "body.object_attributes.source != 'merge_request_event'"
      template:
        spec:
          resourcetemplates:
            - apiVersion: tekton.dev/v1beta1
              kind: PipelineRun
              metadata:
                generateName: kickstart-iso-to-prod
                namespace: sway-sig
              spec:
                pipelineRef:
                  name: kickstart-to-prod
                workspaces:
                  - name: prod
                    persistentVolumeClaim:
                      claimName: sway-nginx

Which when the eventlistener is triggered results in

"2022/08/22 17:06:39 http: TLS handshake error from 10.244.9.106:59974: remote error: tls: bad certificate"

"{\"level\":\"error\",\"ts\":\"2022-08-22T17:06:39.345Z\",\"logger\":\"eventlistener\",\"caller\":\"sink/sink.go:381\",\"msg\":\"Post \\\"https://tekton-triggers-core-interceptors.tekton-pipelines.svc:8443/gitlab\\\": x509: certificate signed by unknown authority\",\"eventlistener\":\"gitlab-fedora-kickstart\",\"namespace\":\"sway-sig\",\"/triggers-eventid\":\"70a73222-0462-463e-8556-cd5e141cf5c2\",\"eventlistenerUID\":\"5d5233b3-119c-4827-9789-dbdd7dbe55b3\",\"/triggers-eventid\":\"70a73222-0462-463e-8556-cd5e141cf5c2\",\"/trigger\":\"gitlab-pipeline-events-trigger\",\"stacktrace\":\"github.com/tektoncd/triggers/pkg/sink.Sink.processTrigger\\n\\tgithub.com/tektoncd/triggers/pkg/sink/sink.go:381\\ngithub.com/tektoncd/triggers/pkg/sink.Sink.HandleEvent.func1\\n\\tgithub.com/tektoncd/triggers/pkg/sink/sink.go:196\"}"  

anthr76 avatar Aug 22 '22 17:08 anthr76

Hi @anthr76 I have tried the steps you mentioned but i dont see such error

Because when we remove caBundle from ClusterInterceptor

apiVersion: triggers.tekton.dev/v1alpha1
kind: ClusterInterceptor
metadata:
  creationTimestamp: "2022-08-17T02:26:29Z"
  generation: 2
  labels:
    server/type: https
  name: gitlab
  resourceVersion: "64651695"
  uid: 68d1afbd-54f6-4e13-8088-5f4462d99e69
spec:
  clientConfig:
    caBundle: <removed>
    service:
      name: tekton-triggers-core-interceptors
      namespace: tekton-pipelines
      path: gitlab
      port: 8443

Triggers do watch on the core ClusterInterceptor for every minute and if there is no caBundle it will add it and because of that we don't see this error tls: bad certificate"

savitaashture avatar Aug 23 '22 15:08 savitaashture

Could you provide me step by step instruction which you have followed and because of that you observed above issue

savitaashture avatar Aug 23 '22 15:08 savitaashture

Sure I will provide live manifests to see if that helps?

I deploy Tekton like: https://github.com/anthr76/infra/blob/tekton-sway-sig/k8s/base/tekton-pipelines/deploy/kustomization.yaml

Set up an event listener like: https://github.com/anthr76/infra/blob/tekton-sway-sig/k8s/base/sway-sig/eventlisteners/gitlab-listener.yaml

Put an ingress on the eventlistener: https://github.com/anthr76/infra/blob/tekton-sway-sig/k8s/base/sway-sig/eventlisteners/ingress.yaml

Have gitlab send a POST to the ingress which results in Hook executed successfully: HTTP 202

Observe the el-gitlab-fedora-kickstart pod throw the error:

{"level":"error","ts":"2022-08-23T15:20:16.862Z","logger":"eventlistener","caller":"sink/sink.go:381","msg":"Post \"https://tekton-triggers-core-interceptors.tekton-pipelines.svc:8443/gitlab\": x509: certificate signed by unknown authority","eventlistener":"gitlab-fedora-kickstart","namespace":"sway-sig","/triggers-eventid":"5dd5ff68-ccd0-4059-8e45-2c1167dbde23","eventlistenerUID":"85bf0ae7-a6d5-4338-8954-1c0b75f5d667","/triggers-eventid":"5dd5ff68-ccd0-4059-8e45-2c1167dbde23","/trigger":"gitlab-pipeline-events-trigger","stacktrace":"github.com/tektoncd/triggers/pkg/sink.Sink.processTrigger\n\tgithub.com/tektoncd/triggers/pkg/sink/sink.go:381\ngithub.com/tektoncd/triggers/pkg/sink.Sink.HandleEvent.func1\n\tgithub.com/tektoncd/triggers/pkg/sink/sink.go:196"}

Observe the tekton-triggers-core-interceptors pod throw the error:

2022/08/23 15:20:16 http: TLS handshake error from 10.244.7.132:36832: remote error: tls: bad certificate

Try basic debugging in a netshoot pod:

tmp-shell-1  ~  curl -k https://tekton-triggers-core-interceptors.tekton-pipelines.svc:8443/gitlab
failed to parse body as InterceptorRequest: unexpected end of JSON input

 tmp-shell-1  ~  curl https://tekton-triggers-core-interceptors.tekton-pipelines.svc:8443/gitlab   
curl: (60) SSL certificate problem: self signed certificate
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

To me it seems like the eventlistener is unaware of the SSL to communicate with the cluster interceptor

The eventlistener is configured with this RBAC https://github.com/anthr76/infra/blob/tekton-sway-sig/k8s/base/sway-sig/eventlisteners/rbac.yaml

anthr76 avatar Aug 23 '22 15:08 anthr76

After looking closer at this issue and the lack of others able to reproduce I ended up removing all CRDs related to tekton and the namespace itself (tekton-pipelines) after doing so this error has went away. Not exactly sure of the lingering resource that hurt me here but if I find out I will update this post.

anthr76 avatar Aug 29 '22 13:08 anthr76

@anthr76 Thanks a lot

Considering your comment will be closing this for now and do open it if you face it again

So closing it for now

savitaashture avatar Aug 29 '22 13:08 savitaashture