serving icon indicating copy to clipboard operation
serving copied to clipboard

AutoTLS with cert-manager creating kcert, but not cert

Open girlpunk opened this issue 2 years ago • 13 comments

Currently trying to set up knative servicing AutoTLS on a bare metal cluster. The cluster has an existing cert-manager and ClusterIssuer, which has previously been used to generate certificates for services, showing that the issuer is working as expected. After installing knative using the operator, I successfully set up a service without TLS, which suggests that knative is also working as expected.

Once AutoTLS is enabled, the knative service shows a state of "unknown" with the reason "CertificateNotReady". I can see a knative certificate has been created, but it's "ready", "reason", and "events" are all empty. From looking at other issues, I can see that a cert-manager certificate should be created, though this doesn't appear to be happening.

kubectl describe knativeservings.operator.knative.dev -n knative-serving knative-serving

Name:         knative-serving
Namespace:    knative-serving
Labels:       networking.knative.dev/certificate-provider=cert-manager
Annotations:  <none>
API Version:  operator.knative.dev/v1beta1
Kind:         KnativeServing
Metadata:
  Creation Timestamp:  2022-04-21T21:10:21Z
  Finalizers:
    knativeservings.operator.knative.dev
  Generation:  8
  Resource Version:  106678848
  UID:               2d438f12-4873-4c49-9df4-a0afc8453c5e
Spec:
  Config:
    Autoscaler:
      Enable - Scale - To - Zero:  true
    Certmanager:
      Issuer Ref:  - kind: ClusterIssuer
- name: letsencrypt

    Domain:
      knative.my-domain-here.com:  
    Network:
      Auto - Tls:       Enabled
      Http - Protocol:  Redirected
      Ingress - Class:  kourier.ingress.networking.knative.dev
  Controller - Custom - Certs:
    Name:  
    Type:  
  Ingress:
    Contour:
      Enabled:  false
    Istio:
      Enabled:  false
    Kourier:
      Enabled:  true
  Registry:
Status:
  Conditions:
    Last Transition Time:  2022-04-21T21:45:09Z
    Status:                True
    Type:                  DependenciesInstalled
    Last Transition Time:  2022-04-21T21:45:43Z
    Status:                True
    Type:                  DeploymentsAvailable
    Last Transition Time:  2022-04-21T21:45:09Z
    Status:                True
    Type:                  InstallSucceeded
    Last Transition Time:  2022-04-21T21:45:43Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2022-04-21T21:10:21Z
    Status:                True
    Type:                  VersionMigrationEligible
  Manifests:
    /var/run/ko/knative-serving/1.3.1
  Observed Generation:  8
  Version:              1.3.1
Events:                 <none>

kubectl describe ksvc -n earthwalker earthwalker

Name:         earthwalker
Namespace:    earthwalker
Labels:       <none>
Annotations:  networking.knative.dev/ingress.class: kourier.ingress.networking.knative.dev
              scale-to-zero-grace-period: 300s
              serving.knative.dev/creator: kubernetes-admin
              serving.knative.dev/lastModifier: kubernetes-admin
API Version:  serving.knative.dev/v1
Kind:         Service
Metadata:
  Creation Timestamp:  2022-04-22T20:50:20Z
  Generation:          1
  Resource Version:  106680182
  UID:               c5640172-7fd3-47ae-8881-7f1718489015
Spec:
  Template:
    Metadata:
      Annotations:
        autoscaling.knative.dev/target:        200
        networking.knative.dev/ingress.class:  kourier.ingress.networking.knative.dev
        Scale - To - Zero - Grace - Period:    300s
      Creation Timestamp:                      <nil>
      Labels:
        Deployment:  earthwalker
      Name:          earthwalker-svc
      Namespace:     earthwalker
    Spec:
      Container Concurrency:  0
      Containers:
        Env:
          Name:   EARTHWALKER_CONFIG_PATH
          Value:  /config/config.toml
          Name:   EARTHWALKER_PORT
          Value:  8080
        Image:    registry.gitlab.com/glatteis/earthwalker:latest
        Name:     earthwalker
        Ports:
          Container Port:  8080
          Protocol:        TCP
        Readiness Probe:
          Success Threshold:  1
          Tcp Socket:
            Port:  0
        Resources:
          Limits:
            Cpu:     500m
            Memory:  128Mi
          Requests:
            Cpu:     10m
            Memory:  64Mi
        Volume Mounts:
          Mount Path:        /config
          Name:              earthwalker-config
          Read Only:         true
          Sub Path:          config.toml
      Enable Service Links:  false
      Timeout Seconds:       300
      Volumes:
        Config Map:
          Items:
            Key:   config.toml
            Path:  config.toml
          Name:    earthwalker-config
        Name:      earthwalker-config
  Traffic:
    Latest Revision:  true
    Percent:          100
Status:
  Address:
    URL:  http://earthwalker.earthwalker.svc.cluster.local
  Conditions:
    Last Transition Time:        2022-04-22T20:50:28Z
    Status:                      True
    Type:                        ConfigurationsReady
    Last Transition Time:        2022-04-22T20:50:29Z
    Message:                     Certificate route-262671d3-9e58-45e7-b769-13e2291324e8 is not ready.
    Reason:                      CertificateNotReady
    Status:                      Unknown
    Type:                        Ready
    Last Transition Time:        2022-04-22T20:50:29Z
    Message:                     Certificate route-262671d3-9e58-45e7-b769-13e2291324e8 is not ready.
    Reason:                      CertificateNotReady
    Status:                      Unknown
    Type:                        RoutesReady
  Latest Created Revision Name:  earthwalker-svc
  Latest Ready Revision Name:    earthwalker-svc
  Observed Generation:           1
  Traffic:
    Latest Revision:  true
    Percent:          100
    Revision Name:    earthwalker-svc
  URL:                https://earthwalker.earthwalker.knative.my-domain-here.com
Events:
  Type    Reason   Age   From                Message
  ----    ------   ----  ----                -------
  Normal  Created  32m   service-controller  Created Configuration "earthwalker"
  Normal  Created  32m   service-controller  Created Route "earthwalker"

kubectl describe kcert -n earthwalker route-262671d3-9e58-45e7-b769-13e2291324e8

Name:         route-262671d3-9e58-45e7-b769-13e2291324e8
Namespace:    earthwalker
Labels:       serving.knative.dev/route=earthwalker
Annotations:  networking.knative.dev/certificate.class: cert-manager.certificate.networking.knative.dev
              networking.knative.dev/ingress.class: kourier.ingress.networking.knative.dev
              scale-to-zero-grace-period: 300s
              serving.knative.dev/creator: kubernetes-admin
              serving.knative.dev/lastModifier: kubernetes-admin
API Version:  networking.internal.knative.dev/v1alpha1
Kind:         Certificate
Metadata:
  Creation Timestamp:  2022-04-22T21:11:23Z
  Generation:          1
  Owner References:
    API Version:           serving.knative.dev/v1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  Route
    Name:                  earthwalker
    UID:                   262671d3-9e58-45e7-b769-13e2291324e8
  Resource Version:        106686685
  UID:                     4e9b0b8f-873b-4e73-a65e-bf28d0e36eed
Spec:
  Dns Names:
    earthwalker.earthwalker.knative.my-domain-here.com
  Secret Name:  route-262671d3-9e58-45e7-b769-13e2291324e8
Events:         <none>

girlpunk avatar Apr 22 '22 21:04 girlpunk

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Jul 22 '22 01:07 github-actions[bot]

/remove-lifecycle stale

girlpunk avatar Jul 28 '22 10:07 girlpunk

Same here version 1.7, did you ever solve this issue?

JCzz avatar Sep 07 '22 09:09 JCzz

Unfortunately not, which is a pity as knative looks like a good fit for my use case but fairly useless if it doesn't work.

girlpunk avatar Sep 07 '22 15:09 girlpunk

Assuming you followed all the steps here, right? And the cert-manager on your cluster is v1.0+?

psschwei avatar Sep 07 '22 17:09 psschwei

I can confirm I have:

  • knative-operator 1.7.0
  • knative-serving 1.7.1
  • kourier 1.7.0
  • cert-manager 1.9.1
  • knative configured with a custom domain, which has a wildcard A record set to knative's ingress IP
  • cert-manager configured for HTTP-01 validations, and confirmed working with the haproxy ingress

I believe I may have located the source of the problem. Despite having configured the ClusterIssuer letsencrypt in the KnativeServing resource, the config-certmanager configmap still has an example configuration.

$ kubectl get knativeservings.operator.knative.dev -n knative-serving knative-serving -o yaml
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  finalizers:
  - knativeservings.operator.knative.dev
  labels:
    networking.knative.dev/certificate-provider: cert-manager
  name: knative-serving
[...]
spec:
  config:
    certmanager:
      issuerRef: |
        kind: ClusterIssuer
        name: letsencrypt
[...]
$ kubectl describe configmaps -n knative-serving config-certmanager
Name:         config-certmanager
Namespace:    knative-serving
Labels:       app.kubernetes.io/component=net-certmanager
              app.kubernetes.io/name=knative-serving
              app.kubernetes.io/version=1.7.0
              networking.knative.dev/certificate-provider=cert-manager
Annotations:  <none>

Data
====
_example:
----
################################
#                              #
#    EXAMPLE CONFIGURATION     #
#                              #
################################

# This block is not actually functional configuration,
# but serves to illustrate the available configuration
# options and document them in a way that is accessible
# to users that `kubectl edit` this config map.
#
# These sample configuration options may be copied out of
# this block and unindented to actually change the configuration.

# issuerRef is a reference to the issuer for this certificate.
# IssuerRef should be either `ClusterIssuer` or `Issuer`.
# Please refer `IssuerRef` in https://github.com/cert-manager/cert-manager/tree/master/pkg/apis/certmanager/v1/types_certificate.go
# for more details about IssuerRef configuration.
issuerRef: |
  kind: ClusterIssuer
  name: letsencrypt-issuer


BinaryData
====

Events:  <none>

girlpunk avatar Sep 09 '22 21:09 girlpunk

Looks like this is being caused by a permissions issue, is there a way to check the operator logs to see why policies aren't getting set up correctly?

W0909 21:59:29.092786       1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.Secret: Unauthorized
E0909 21:59:29.092821       1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: Unauthorized
W0909 21:59:33.456959       1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.ConfigMap: Unauthorized
E0909 21:59:33.457001       1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: Unauthorized
E0909 21:59:36.207204       1 leaderelection.go:330] error retrieving resource lock knative-serving/net-certmanager-webhook.configmapwebhook.00-of-01: Unauthorized
E0909 21:59:41.279851       1 leaderelection.go:330] error retrieving resource lock knative-serving/net-certmanager-webhook.webhookcertificates.00-of-01: Unauthorized
E0909 21:59:56.144964       1 leaderelection.go:330] error retrieving resource lock knative-serving/net-certmanager-webhook.configmapwebhook.00-of-01: Unauthorized

girlpunk avatar Sep 09 '22 22:09 girlpunk

Ok so found another issue that suggests the permissions need to be set up manually. Did that by setting up a custom role binding

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: knative-serving-certmanager
  labels:
    operator.knative.dev/release: "v1.7.0"
    app.kubernetes.io/version: "1.7.0"
    app.kubernetes.io/part-of: knative-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: knative-serving-certmanager
subjects:
  - kind: ServiceAccount
    name: controller
    namespace: knative-serving

Having done that, stuff is definitely getting further but certificates still aren't being correctly generated. I managed to find some log entries from the kourier controller that the solver services are missing, but no logs from anything else on trying/failing to create them.

{"severity":"WARNING","timestamp":"2022-09-10T11:57:58.637513758Z","logger":"net-kourier-controller","caller":"generator/ingress_translator.go:137","message":"Service 'earthwalker/cm-acme-http-solver-6422x' not yet created","commit":"09b107b-dirty","knative.dev/controller":"knative.dev.net-kourier.pkg.reconciler.ingress.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"51ab5d29-e64a-40a9-b11c-03a03ca0ddea","knative.dev/key":"earthwalker/earthwalker"}

girlpunk avatar Sep 10 '22 12:09 girlpunk

How did you install net-certmanager? (Please note that cert-manager is different from Knative's net-certmanager). operator does not have an option to install net-certmanager so you need to install it manually like the following command or spec.additionalManifests as described in https://github.com/knative/operator/issues/950.

$ kubectl apply -f https://github.com/knative/net-certmanager/releases/download/knative-v1.7.0/release.yaml

nak3 avatar Sep 11 '22 09:09 nak3

From a kustomization.yaml file:

bases:
- https://github.com/knative/operator/releases/download/knative-v1.7.0/operator.yaml
- https://github.com/knative/net-certmanager/releases/download/knative-v1.7.0/release.yaml

resources:
- roleBinding.yaml

I can see that both the net-certmanager-controller and net-certmanager-webhook deployments exist and are running 1 pod each.

girlpunk avatar Sep 11 '22 12:09 girlpunk

Thank you. Hmm.... I tested the autoTLS if I can reproduce it or not but it works without any issue like permission. I am sharing the steps below I did, so could you double-check if there is any steps you missed? I think you are doing correct, though.

1.Deploy operator

export VERSION=knative-v1.7.0
kubectl apply -f https://github.com/knative/operator/releases/download/$VERSION/operator.yaml
kubectl wait deploy --all --for=condition=Available

2.Deploy knative serving

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: knative-serving
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  ingress:
    kourier:
      enabled: true
  config:
    network:
      auto-tls: Enabled
      http-protocol: Redirected
      ingress-class: kourier.ingress.networking.knative.dev
EOF

kubectl wait deploy --all --for=condition=Available -n knative-serving

3.Deploy net-certmanager

kubectl apply --filename https://github.com/knative/net-certmanager/releases/download/$VERSION/release.yaml

4.Deploy cert-manager

export SERVING_REPO=${GOPATH}/src/knative.dev/serving

kubectl apply -f ${SERVING_REPO}/third_party/cert-manager-latest/
kubectl wait deploy --all --for=condition=Available -n knative-serving -n cert-manager

NOTE: ${GOPATH}/src/knative.dev/serving is this knative/serving repo.

5.Deploy caissue

kubectl apply -f ${SERVING_REPO}/test/config/autotls/certmanager/caissuer/

EDIT kubectl patch cm config-network -n "knative-serving" -p '{"data":{"autoTLS":"Enabled"}}' is not good way for oprator. I re-tested the configuration in KnativeServing CR and updated the instructions.

6.Deploy ksvc & verify the autoTLS

kn service create hello-example --image=gcr.io/knative-samples/helloworld-go
$ kubectl  get ksvc
NAME            URL                                         LATESTCREATED         LATESTREADY           READY   REASON
hello-example   https://hello-example.default.example.com   hello-example-00001   hello-example-00001   True

$ kubectl  get kcert
NAME                                         READY   REASON
route-e2d9d6a1-8601-4d58-8952-62f6229d13f2   True

$ kubectl  get cert
NAME                                         READY   SECRET                                       AGE
route-e2d9d6a1-8601-4d58-8952-62f6229d13f2   True    route-e2d9d6a1-8601-4d58-8952-62f6229d13f2   7m

nak3 avatar Sep 12 '22 02:09 nak3

I believe I may have located the source of the problem. Despite having configured the ClusterIssuer letsencrypt in the KnativeServing resource, the config-certmanager configmap still has an example configuration.

For the issue config-certmanager you mentioned above, the config-certmanager(including in net-certmanager) is not deployed by operator so you need to configure it directory rather than KnativeServing CR.

nak3 avatar Sep 12 '22 03:09 nak3

Having manually deployed the config-certmanager, HTTP-01 verification still has the same issue though I can now get certificates using DNS-01 verification.

I'm also facing another problem getting this to work on a user-friendly domain instead of the service.namespace.mydomain.tld one. Specifically, the DomainMapping remains unready with a " Waiting for load balancer to be ready" message. Trying to connect to the service with this error shows the following HTTP response, which I think I had at one point while trying to preview the HTTP-01 verification URL. Could this be related?

upstream connect error or disconnect/reset before headers. reset reason: connection failure

girlpunk avatar Sep 12 '22 20:09 girlpunk

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Dec 12 '22 01:12 github-actions[bot]

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close

/lifecycle stale

knative-prow-robot avatar Jan 11 '23 01:01 knative-prow-robot

/remove-lifecycle rotten

girlpunk avatar Jan 16 '23 16:01 girlpunk

/reopen

girlpunk avatar Jan 16 '23 16:01 girlpunk

Just quickly hopping in to let you know I also had this issue. If you remove the

http: redirected

from the configmap, certmanager is able to issue the certificates. If I enable it the 301 will break the cert issuing.

So if you leave it on default your Services will listen on 80 and 443. You have to enforce the https redirect somewhere else.

Hope this helps. BR

E: As soon as the certs are ready you can add the redirect again which will break the cert update mechanism of the certmanager.

Befisch avatar Jan 31 '23 11:01 Befisch

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar May 02 '23 01:05 github-actions[bot]