scylla-operator Manage TLS certificates

Description of your changes: This PR adds a control loop to ScyllaCluster controller that will automatically create both serving and client TLS certificates. This will be added to all ScyllaClusters but as it is a big change the feature is disabled by default and can be enabled with a feature gate flag --feature-gates=AutomaticTLSCertificates=true on the scylla operator deployment. In the future CQL over TLS will become the default and the bare CQL will be disabled by default for security reasons. There is still a long way there though.

Which issue is resolved by this Pull Request: Resolves #256

TODO:

[x] Add e2e for cert rotation
[x] https://github.com/scylladb/scylla-operator/pull/1039
[ ] https://github.com/scylladb/scylla-operator/pull/1070
[ ] Add new required check for alpha job when this PR merges

Aug 29 '22 20:08 tnozicka

thanks, for the review

(first round of comments addressed)

Sep 26 '22 14:09 tnozicka

@tnozicka are the new feature flags going to be expose in the helm chart ? (i.e. in SCT we are deploying using the helm chart)

Sep 29 '22 14:09 fruch

@tnozicka are the new feature flags going to be expose in the helm chart ? (i.e. in SCT we are deploying using the helm chart)

Not in this PR but I'd be fine with a followup if you really want it. Or just use kubectl patch.

Sep 30 '22 07:09 tnozicka

@tnozicka are the new feature flags going to be expose in the helm chart ? (i.e. in SCT we are deploying using the helm chart)

Not in this PR but I'd be fine with a followup if you really want it. Or just use kubectl patch.

Why not in this PR? Operator's new config option gets added exactly here. What is the reason to intentionally make the facade be more and more incompatible?

Upd: after talking in chat we agreed that it will not be done now. Users who will want to use TLS from the box will be able to enable it by using "kubectl patch" command after the operator is installed by any means (helm or manifests).

Sep 30 '22 07:09 vponomaryov

@tnozicka I've taken an image from this PR, and one enabled this feature, I'm getting those errors:

W1002 08:39:09.105631       1 cache/reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Ingress: ingresses.networking.k8s.io is forbidden: User "system:serviceaccount:scylla-operator:scylla-operator" cannot list resource "ingresses" in API group "networking.k8s.io" at the cluster scope
E1002 08:39:09.105682       1 cache/reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Ingress: failed to list *v1.Ingress: ingresses.networking.k8s.io is forbidden: User "system:serviceaccount:scylla-operator:scylla-operator" cannot list resource "ingresses" in API group "networking.k8s.io" at the cluster scope

Oct 02 '22 09:10 fruch

@tnozicka I've taken an image from this PR, and one enabled this feature, I'm getting those errors:

W1002 08:39:09.105631       1 cache/reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Ingress: ingresses.networking.k8s.io is forbidden: User "system:serviceaccount:scylla-operator:scylla-operator" cannot list resource "ingresses" in API group "networking.k8s.io" at the cluster scope
E1002 08:39:09.105682       1 cache/reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Ingress: failed to list *v1.Ingress: ingresses.networking.k8s.io is forbidden: User "system:serviceaccount:scylla-operator:scylla-operator" cannot list resource "ingresses" in API group "networking.k8s.io" at the cluster scope

o.k. seem like it was stupid me, I was using old 1.7.2 helm chart with the image from this PR...

Oct 02 '22 15:10 fruch

I forgot to sort the cert inputs, that explains the latest flake

                IPAddresses: []net.IP{
        -               s"10.101.162.44",
        +               s"10.102.200.4",
        -               s"10.102.200.4",
        +               s"10.101.162.44",
                },

@zimnx fixed squashed, the diff is https://github.com/scylladb/scylla-operator/compare/95421b6137079c4e7fa657fefba0e6317db81730..063205ff4e8e5b382505bf9352908bf5f4c946c1#diff-49aec2779932d30a099911e2a0b13b1bffe6207ff60410611087df5c74823c7dR114

Oct 03 '22 14:10 tnozicka

another notable change is aggregating conditions and verifying them https://github.com/scylladb/scylla-operator/compare/cb6c29f846290e8353521fa25fa7ea601c41862c..1871d38b04dde4723d83ae70fc122eb579ee7a41

This is needed for future work that will add a scaling condition whose aggregation will prevent considering a rollout done when scaling down before certcontroller has a chance to set degraded.

Oct 10 '22 14:10 tnozicka

scylla-operator scylla-operator copied to clipboard

Manage TLS certificates

scylla-operator
scylla-operator copied to clipboard