bundle-kubeflow icon indicating copy to clipboard operation
bundle-kubeflow copied to clipboard

Add `mpijobs.kubeflow.org` CRD

Open natalian98 opened this issue 1 year ago • 2 comments

This PR adds a CRD which may not get created during training-operator's upgrade process. The issue is described here. It needs to be stored somewhere in our repo to be referenced in upgrade docs, in known post-upgrade issues part. I'm open to suggestions if you think it should live elsewhere.

natalian98 avatar Aug 19 '22 13:08 natalian98

Hi @natalian98, thanks for filing canonical/training-operator#44 and proposing a fix for it here. I think that we probably want to move away from patch and create and instead use apply in our charms. We have defined some patterns in chisme that move away from an imperative approach, and I think we could use some of that logic in training-operator, without needing to migrate it, though we want to do it soon. That being said, I would say a better approach for ensuring a smoother upgrade would be to change what we do on config change:

        try:
            ...
            self.unit.status = MaintenanceStatus("Patching CRDs")
            self._apply_resource(resource_type="crds")

Which would imply a change in patch_resources to use client.apply instead.

What do you think? Would this approach work for upgrades?

DnPlas avatar Aug 26 '22 05:08 DnPlas

Thanks a lot for checking this @dnplas. It's a great idea to use apply instead of create and patch, especially on upgrade. What worries me is the install part: currently when this charm is re-deployed, the CRD should be created on install event. In these logs in line 177 we can see that the install hook was run, but the CRD was still not created.

natalian98 avatar Aug 26 '22 13:08 natalian98

Upgrade in training operator was redesigned. Upgrade process was tested and installed CRDs are verified. Closing this PR.

i-chvets avatar Jun 29 '23 14:06 i-chvets