flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard
How to enforce SSL/TLS everywhere through this operator?
Hey everyone, I've been trying this operator successfully on OpenShift after making a few small changes and applying a workaround https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/issues/288 in to use Flink 1.11.
Now I'd like to check that I can use SSL/TLS everywhere as per https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html. I had a look through https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/3352bf51c0d3167ba87a626cf5d6ef37753b8c57/docs/crd_v1alpha1.md and I noticed there's useTLS
for the Ingress endpoint (I assume for external access, so perhaps securing the Flink UI?) but I don't see anything for internal communications.
Is it possible to achieve this through the operator and if so, how? I don't see it is as a supported feature on the main readme but I am thinking it would be done through an override in here for the FlinkCluster CR
spec:
flinkProperties:
I'm wondering if anyone's done this before, I'll have a try anyway and see what happens, but couldn't find any documentation on this for the operator itself (lemme know if I've missed something please) and hence my curiosity in the event it's something not yet available.
Thanks!
Update, you can do it - make the keystore/truststore etc upfront first and then create a secret + mount it in. I don't care for any of these values being known (just testing on my laptop)
kind: FlinkCluster
metadata:
name: tls-flink-cluster-1-11
spec:
jobManager:
volumeMounts:
- name: flink-secret-volume
mountPath: /etc/flink-secrets
volumes:
- name: flink-secret-volume
secret:
secretName: flink-tls-secret
accessScope: Cluster
resources:
limits:
memory: 600Mi
cpu: "1.0"
taskManager:
volumeMounts:
- name: flink-secret-volume
mountPath: /etc/flink-secrets
volumes:
- name: flink-secret-volume
secret:
secretName: flink-tls-secret
replicas: 1
resources:
limits:
memory: 1Gi
cpu: "1.0"
image:
name: flink:scala_2.12-java8
# https://ci.apache.org/projects/flink/flink-docs-stable/ops/security-ssl.html is helpful for this part.
web.submit.enable: "false"
taskmanager.numberOfTaskSlots: "1"
jobmanager.heap.size: "" # set empty value (only for Flink version 1.11 or above)
jobmanager.memory.process.size: 1gb # job manager memory limit (only for Flink version 1.11 or above)
taskmanager.heap.size: "" # set empty value
taskmanager.memory.process.size: 1gb # task manager memory limit
security.ssl.internal.enabled: "true"
security.ssl.internal.keystore: /etc/flink-secrets/internal-keystore.p12
security.ssl.internal.truststore: /etc/flink-secrets/internal-keystore.p12
security.ssl.internal.keystore-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
security.ssl.internal.truststore-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
security.ssl.internal.key-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
security.ssl.rest.enabled: "true"
security.ssl.rest.keystore: /etc/flink-secrets/rest-keystore.p12
security.ssl.rest.truststore: /etc/flink-secrets/ca-truststore.p12
security.ssl.rest.keystore-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
security.ssl.rest.truststore-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
security.ssl.rest.key-password: DD562D1B-742F-45AB-9228-98874C356076 # Replace with generated password
I made the files upfront and have them in a secret with the following format:
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: flink-tls-secret
data:
ca-keystore.p12: $(cat ./certs/ca-keystore.p12 | base64 | tr -d '\n')
ca-truststore.p12: $(cat ./certs/ca-truststore.p12 | base64 | tr -d '\n')
internal-keystore.p12: $(cat ./certs/internal-keystore.p12 | base64 | tr -d '\n')
rest-keystore.p12: $(cat ./certs/rest-keystore.p12 | base64 | tr -d '\n')
store-password.txt: $(cat ./certs/store-password.txt | base64 | tr -d '\n')
Thanks for your question! The operator has no first class support for SSL/TLS. If you have successfully configured it through flinkProperties
, it would be nice if you can share your experience by adding a section to the user guide. Thank you!
Thanks @functicons, good to know! I'll be happy to share what myself and a colleague at IBM have at the moment, currently trying to submit a job to the Job Manager (by port-forwarding and doing a normal flink run) and seeing problems though, so while it may be secure* it's not so useful yet without good docs
- do you know of a good way to verify this?
@SparkX120, a colleague at IBM, has suggested this would be an improvement to the CR instead of needing to specify all of the options as well, worth mentioning here I think:
FlinkCluster:
metadata:
name: my-cluster
spec:
ha:
enabled: true
tls:
enabled: true
existingTlsSecret: secret-name
taskManager:
replicas: 4
memory: 4Gi
taskSlots: 8
Just in terms of simplicity. I find specifying something as enabled as
tls: {}
rather than
tls:
enabled: true
Is cleaner and easier to add. Then if someone wants access to additional configuration they open up the section. So
tls:
existingTlsSecret: secret-name
renewal:
dnsNames:
etc..
The same can be done with ha
.
Additionally, when referencing a secret, there is a Kubernetes standard. So
tlsSecret:
secretName: my-secret
or
tlsKey:
secretName: my-secret
key: tls.key
tlsCert:
secretName: my-secret
key: tls.crt
You don't always want to work with kubernetes secrets for this, you can use a vault for the certificate/passphrase. It can be achieved using either:
- init container (that downloads the certificate from the vault)
- mapping of the secret to a volume
and the needed flink configuration.
I think issue #383 is important, but I'm not sure the "tls" config is required here, maybe an example for a cluster with SSL can be enough.
Either way I think its important to keep the possibility to support all possible certificate gathering solutions.
What do you think? @a-roberts @chrispatmore @EnriqueL8 @functicons
You don't always want to work with kubernetes secrets for this, you can use a vault for the certificate/passphrase. It can be achieved using either:
- init container (that downloads the certificate from the vault)
- mapping of the secret to a volume
and the needed flink configuration.
I think issue #383 is important, but I'm not sure the "tls" config is required here, maybe an example for a cluster with SSL can be enough. Either way I think its important to keep the possibility to support all possible certificate gathering solutions. What do you think? @a-roberts @chrispatmore @EnriqueL8 @functicons
Great feedback and suggestions, so...
init container (that downloads the certificate from the vault)
I actually tried this approach, but the environment I am working in is OpenShift with OLM, and the way I coded this init container approach caused problems (since I was modifying things at runtime for my own deployment spec, managed by OLM)
I've updated one of my posts above just to mention what I eventually got working - making the secret upfront and mounting it in through a volume, with the needed Flink configuration.
Either way I think its important to keep the possibility to support all possible certificate gathering solutions.
Absolutely, having a maintained set of examples, using our existing CR definitions, would be really helpful.
I agree, supporting multiple ways of configuring TLS is important. One such way can be configuring to use and work with https://cert-manager.io/ which is becoming a popular way of managing certificates in Kubernetes. But it is by no means the only or necessarily always the "best" way.
For me what would be nice is first class support for enabling and configuring TLS in a user friendly way. such that for example I could specify
tls: {}
and have TLS turned on everywhere with self signed certificates.
Or I could expand that section and specify where / how the cluster should retrieve its certificates
@functicons
Thanks for your question! The operator has no first class support for SSL/TLS. If you have successfully configured it through
flinkProperties
, it would be nice if you can share your experience by adding a section to the user guide. Thank you!
- SSL for Internal it works just fine adding required configuration into flink-configuration.
- And we also tried to configure SSL for External(REST)connectivity. But it didn't work using the operator. AYMK, In this below two places were using http and currently there is no way to set the certificate into the operator. So it failed to watch the status of the JM and submitting the job.
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/controllers/flinkclient/http_client.go
https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/0310df76d6e2128cd5d2bc51fae4e842d370c463/controllers/flinkcluster_submit_job_script.go#L61
This is important because any k8s user knowing the clusterIP will be able to submit the job from any other container within the same k8s cluster namespace, even though we could suppress using ingress authentication for the flink webUI. Do you have any suggestion for this?