scylla-operator
scylla-operator copied to clipboard
Manage TLS certificates
Description of your changes:
This PR adds a control loop to ScyllaCluster controller that will automatically create both serving and client TLS certificates. This will be added to all ScyllaClusters but as it is a big change the feature is disabled by default and can be enabled with a feature gate flag --feature-gates=AutomaticTLSCertificates=true
on the scylla operator deployment. In the future CQL over TLS will become the default and the bare CQL will be disabled by default for security reasons. There is still a long way there though.
Which issue is resolved by this Pull Request: Resolves #256
TODO:
- [x] Add e2e for cert rotation
- [x] https://github.com/scylladb/scylla-operator/pull/1039
- [ ] https://github.com/scylladb/scylla-operator/pull/1070
- [ ] Add new required check for alpha job when this PR merges
thanks, for the review
(first round of comments addressed)
@tnozicka are the new feature flags going to be expose in the helm chart ? (i.e. in SCT we are deploying using the helm chart)
@tnozicka are the new feature flags going to be expose in the helm chart ? (i.e. in SCT we are deploying using the helm chart)
Not in this PR but I'd be fine with a followup if you really want it. Or just use kubectl patch
.
@tnozicka are the new feature flags going to be expose in the helm chart ? (i.e. in SCT we are deploying using the helm chart)
Not in this PR but I'd be fine with a followup if you really want it. Or just use
kubectl patch
.
Why not in this PR? Operator's new config option gets added exactly here. What is the reason to intentionally make the facade be more and more incompatible?
Upd
: after talking in chat we agreed that it will not be done now.
Users who will want to use TLS from the box will be able to enable it by using "kubectl patch" command after the operator is installed by any means (helm or manifests).
@tnozicka I've taken an image from this PR, and one enabled this feature, I'm getting those errors:
W1002 08:39:09.105631 1 cache/reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Ingress: ingresses.networking.k8s.io is forbidden: User "system:serviceaccount:scylla-operator:scylla-operator" cannot list resource "ingresses" in API group "networking.k8s.io" at the cluster scope
E1002 08:39:09.105682 1 cache/reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Ingress: failed to list *v1.Ingress: ingresses.networking.k8s.io is forbidden: User "system:serviceaccount:scylla-operator:scylla-operator" cannot list resource "ingresses" in API group "networking.k8s.io" at the cluster scope
@tnozicka I've taken an image from this PR, and one enabled this feature, I'm getting those errors:
W1002 08:39:09.105631 1 cache/reflector.go:424] k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Ingress: ingresses.networking.k8s.io is forbidden: User "system:serviceaccount:scylla-operator:scylla-operator" cannot list resource "ingresses" in API group "networking.k8s.io" at the cluster scope E1002 08:39:09.105682 1 cache/reflector.go:140] k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Ingress: failed to list *v1.Ingress: ingresses.networking.k8s.io is forbidden: User "system:serviceaccount:scylla-operator:scylla-operator" cannot list resource "ingresses" in API group "networking.k8s.io" at the cluster scope
o.k. seem like it was stupid me, I was using old 1.7.2 helm chart with the image from this PR...
I forgot to sort the cert inputs, that explains the latest flake
IPAddresses: []net.IP{
- s"10.101.162.44",
+ s"10.102.200.4",
- s"10.102.200.4",
+ s"10.101.162.44",
},
@zimnx fixed squashed, the diff is https://github.com/scylladb/scylla-operator/compare/95421b6137079c4e7fa657fefba0e6317db81730..063205ff4e8e5b382505bf9352908bf5f4c946c1#diff-49aec2779932d30a099911e2a0b13b1bffe6207ff60410611087df5c74823c7dR114
another notable change is aggregating conditions and verifying them https://github.com/scylladb/scylla-operator/compare/cb6c29f846290e8353521fa25fa7ea601c41862c..1871d38b04dde4723d83ae70fc122eb579ee7a41
This is needed for future work that will add a scaling condition whose aggregation will prevent considering a rollout done when scaling down before certcontroller has a chance to set degraded.