bundle-kubeflow
bundle-kubeflow copied to clipboard
Add `mpijobs.kubeflow.org` CRD
This PR adds a CRD which may not get created during training-operator's upgrade process. The issue is described here. It needs to be stored somewhere in our repo to be referenced in upgrade docs, in known post-upgrade issues part. I'm open to suggestions if you think it should live elsewhere.
Hi @natalian98, thanks for filing canonical/training-operator#44 and proposing a fix for it here. I think that we probably want to move away from patch
and create
and instead use apply
in our charms. We have defined some patterns in chisme
that move away from an imperative approach, and I think we could use some of that logic in training-operator
, without needing to migrate it, though we want to do it soon.
That being said, I would say a better approach for ensuring a smoother upgrade would be to change what we do on config change:
try:
...
self.unit.status = MaintenanceStatus("Patching CRDs")
self._apply_resource(resource_type="crds")
Which would imply a change in patch_resources to use client.apply
instead.
What do you think? Would this approach work for upgrades?
Thanks a lot for checking this @dnplas. It's a great idea to use apply instead of create and patch, especially on upgrade. What worries me is the install part: currently when this charm is re-deployed, the CRD should be created on install event. In these logs in line 177 we can see that the install hook was run, but the CRD was still not created.
Upgrade in training operator was redesigned. Upgrade process was tested and installed CRDs are verified. Closing this PR.