community
community copied to clipboard
Generate unique name for training job resources
If we create the following sagemaker resource it will silently fail if a training job with the name test-training-job
already exists.
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: TrainingJob
metadata:
generateName: test-training-job
spec:
trainingJobName: test-training-job
It would be nice if the controller had a generateName-like functionality to ensure a unique name. We create TrainingJobs through an Argo Workflow (the workflow submits the k8s manifest), on failure we will retry n times. However we cannot update trainingJobName
field in the manifest on each retry, ensuring that the second attempt will always fail as a training job with the same name already exists.
@dfarr The problem with controller-generated names is that they essentially make the resource impossible to manage using a fully declarative, GitOps-style configuration management system... Because the Spec for the resource with a generated name doesn't actually represent the desired state of the resource, but rather a template of a desired state of a resource (or multiple resources). So, for example, the GitOps controller cannot tell when the desired state of a resource has changed because it doesn't know what the actual name of the resource is, only the generated name template for an instance of that resource type...
This is the problem with imperative-style APIs like SageMaker conflicting with declarative-style APIs like Kubernetes. It's almost like we need to create a separate TrainingJobTemplate resource with support for generated names and have the SageMaker ACK controller treat those resources in the same way that the Kubernetes built-in Deployment controller treats the Spec.Template for Pods in the Deployment...
A generateName
like name could be resolved before the resource hits the SageMaker ACK controller with a mutating webhook. Once the name has been set it will not change so the ACK controller would only ever be aware of the static name
field.
Kubernetes itself does this with resources that support metadata.generateName
. For example, if I create the following pod
apiVersion: v1
kind: Pod
metadata:
generateName: test-
spec:
containers:
- name: main
image: python:3.7
command: [sleep, '999']
And then do kubectl get -o yaml pod/test-8x6m2
, I will see the fully resolved metadata.name
field as part of the spec.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Provide feedback via https://github.com/aws-controllers-k8s/community.
/close
@eks-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen
. Provide feedback via https://github.com/aws-controllers-k8s/community. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.