training-operator icon indicating copy to clipboard operation
training-operator copied to clipboard

Add internal-cert-controller disable flag

Open Garvit-77 opened this issue 10 months ago • 20 comments

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged): Fixes #2049

Garvit-77 avatar Feb 06 '25 23:02 Garvit-77

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow[bot] avatar Feb 12 '25 11:02 google-oss-prow[bot]

@tenzen-y I have made the changes accordingly so please review them, thank you

Garvit-77 avatar Feb 13 '25 09:02 Garvit-77

I am wondering do we really need to have cert-manager installation ? Can we re-use OPA cert-manager rotator to give user flexibility to control certificate or cert-manager will give us additional features ? cc @kubeflow/release-team @kubeflow/wg-manifests-leads

I do not have a strong request for CertManager support. One thing is productionizing trainer since they sometimes want to use the certified certifications (OPA internal-certs generates self-signed certs).

As an alternative and minimized solution, we could consider supporting only disabling the OPA internal-cert manager and then giving them to specify arbitrary certifications.

tenzen-y avatar Feb 14 '25 13:02 tenzen-y

OPA internal-certs generates self-signed certs

Does OPA support only self-signed certs @tenzen-y ?

As an alternative and minimized solution, we could consider supporting only disabling the OPA internal-cert manager and then giving them to specify arbitrary certifications.

Yeah, this is what I was thinking as well. Like do we really need to explain cluster admins on how to configure cert manager to generate certs for Kubeflow Trainer webhook?

andreyvelich avatar Feb 14 '25 14:02 andreyvelich

FYI, as I can see even with Cert Manager right now we use self-signed certs by default: https://github.com/kubeflow/manifests/blob/master/apps/katib/upstream/installs/katib-cert-manager/certificate.yaml#L22

andreyvelich avatar Feb 14 '25 14:02 andreyvelich

OPA internal-certs generates self-signed certs

Does OPA support only self-signed certs @tenzen-y ?

Yes, that is OPA internal-certs objective.

As an alternative and minimized solution, we could consider supporting only disabling the OPA internal-cert manager and then giving them to specify arbitrary certifications.

Yeah, this is what I was thinking as well. Like do we really need to explain cluster admins on how to configure cert manager to generate certs for Kubeflow Trainer webhook?

Documentation might be better. @astefanutti Do you see any use cases where customers want to use certified certifications for admission webhook controllers?

tenzen-y avatar Feb 14 '25 14:02 tenzen-y

Do you see any use cases where customers want to use certified certifications for admission webhook controllers?

No, I haven't personally seen any customer requests / requirements to have full-fledged certificate management / PKI for admission webhooks.

cert-manager is more driven by the need to integrate with trusted certificate authority like let's encrypt for external ingress / gateway, not for in-cluster communication between control plane components.

Now I can understand that if cert-manager is already deployed, it can be used to manage internal certificates as well. But in that case, I'd also lean toward adding a CLI flag to disable cert-controller and document an external solution should be provided, such as cert-manager for example.

astefanutti avatar Feb 14 '25 14:02 astefanutti

cc @kubeflow/release-team @kubeflow/kubeflow-steering-committee @juliusvonkohout @franciscojavierarceo It looks like we don't really need to have cert-manager as a hard dependency for Kubeflow projects. So we can simplify Kubeflow control plane complexity: https://github.com/kubeflow/manifests/issues/2451 cc @brsolomon-deloitte @jbottum

andreyvelich avatar Feb 14 '25 16:02 andreyvelich

Do you see any use cases where customers want to use certified certifications for admission webhook controllers?

No, I haven't personally seen any customer requests / requirements to have full-fledged certificate management / PKI for admission webhooks.

cert-manager is more driven by the need to integrate with trusted certificate authority like let's encrypt for external ingress / gateway, not for in-cluster communication between control plane components.

Now I can understand that if cert-manager is already deployed, it can be used to manage internal certificates as well. But in that case, I'd also lean toward adding a CLI flag to disable cert-controller and document an external solution should be provided, such as cert-manager for example.

Thank you for getting back feedback! In that case, it would be better to add only flag to disable internal cert controller and then enhance our documentations how to use arbitrary certificates for the admission webhook controllers.

tenzen-y avatar Feb 14 '25 19:02 tenzen-y

Let me summarize for @Garvit-77: we decided not to support CertManger. So, if you are ok, could you convert this PR to just add a flag to disable and enable the internal cert controller?

tenzen-y avatar Feb 14 '25 19:02 tenzen-y

cc @kubeflow/release-team @kubeflow/kubeflow-steering-committee @juliusvonkohout @franciscojavierarceo It looks like we don't really need to have cert-manager as a hard dependency for Kubeflow projects. So we can simplify Kubeflow control plane complexity: kubeflow/manifests#2451 cc @brsolomon-deloitte @jbottum

There is also KFP and maybe others. But yes, we can remove it as soon as no one is using it anymore.

juliusvonkohout avatar Feb 14 '25 20:02 juliusvonkohout

@tenzen-y Surely , i did understood the conversation except about the OPA and I would be align with the community So just making the changes for a Flag in the Updated PR

Garvit-77 avatar Feb 14 '25 20:02 Garvit-77

There is also KFP and maybe others. But yes, we can remove it as soon as no one is using it anymore.

If KFP uses it in the same way that other Kubeflow projects is using, we can easily remove this dependency. @kubeflow/wg-pipeline-leads @rimolive @anishasthana @HumairAK @hbelmiro @mprahl do you know how cert-manager is currently used in KFP ?

andreyvelich avatar Feb 14 '25 22:02 andreyvelich

There is also KFP and maybe others. But yes, we can remove it as soon as no one is using it anymore.

If KFP uses it in the same way that other Kubeflow projects is using, we can easily remove this dependency. @kubeflow/wg-pipeline-leads @rimolive @anishasthana @HumairAK @hbelmiro @mprahl do you know how cert-manager is currently used in KFP ?

Signed certificates for webhooks are not a bad thing.

When to Use cert-manager:

  1. Automatic Certificate Management: If your webhooks require TLS and you want to automate the issuance and renewal of certificates, cert-manager can simplify this process significantly.
  2. Security: Using TLS for webhooks enhances security by encrypting the traffic between your services.
  3. Dynamic Environments: In environments where services are frequently created and destroyed, cert-manager can help manage certificates dynamically.

juliusvonkohout avatar Feb 17 '25 10:02 juliusvonkohout

cc @thesuperzapper

juliusvonkohout avatar Feb 17 '25 10:02 juliusvonkohout

/retitle add internal-cert-controller disable flag

tenzen-y avatar Feb 17 '25 10:02 tenzen-y

/retitle Add internal-cert-controller disable flag

tenzen-y avatar Feb 17 '25 10:02 tenzen-y

@juliusvonkohout @andreyvelich Should we prepare the dedicated issue to discuss CertManager entirely KF? PR contributors might be confusing with that.

tenzen-y avatar Feb 17 '25 10:02 tenzen-y

I agree with @tenzen-y, I will create dedicated issue in kubeflow/manifests to discuss removal for cert-manager from the Kubeflow Manifests.

andreyvelich avatar Feb 17 '25 13:02 andreyvelich

Hey @tenzen-y @andreyvelich I was just reading the comments on the issue created in Kubeflow Manifests, but everyone has their own and different opinion yet the suggestion from thesupperzapper makes sense comment

Garvit-77 avatar Mar 03 '25 12:03 Garvit-77

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jun 01 '25 15:06 github-actions[bot]

This pull request has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

github-actions[bot] avatar Jun 21 '25 15:06 github-actions[bot]