Daniel Vega-Myhre
Daniel Vega-Myhre
Small bug fix where `true` should be `false`.
## Description The purpose of these changes is to add the k8s manifests corresponding to our guide in the public docs on [Orchestrating TPU MultiSlice workloads using JobSet and Kueue](https://cloud.google.com/kubernetes-engine/docs/tutorials/tpu-multislice-kueue)....
I recently submitted a PR updating the image digest for a tag: https://github.com/kubernetes/k8s.io/pull/5220 However, I noticed the image digest for registry.k8s.io/jobset/jobset:v0.1.1 never updated, and after discussion with @BenTheElder I learned...
We would like to publish a blog post introducing [JobSet](https://jobset.sigs.k8s.io/), a K8s native API for distributed ML training and HPC workloads. cc @ahg-g @kannon92 I think we still need to...
- One-line PR description: Configurable Job failure reason for PodFailurePolicy - Issue link: #4443 - Other comments:
### Enhancement Description - One-line enhancement description (can be used as a release note): Add an optional `Reason` field to the Job `PodFailurePolicyRule`, which allows the user to specify the...
### Enhancement Description - One-line enhancement description (can be used as a release note): Add Pod Index Label for StatefulSets and Indexed Jobs. - Kubernetes Enhancement Proposal: - Discussion Link:...
**What would you like to be added**: A comprehensive example showing how to run a training workload on GPUs using JobSet. We could have one example per major cloud provider....
**What would you like to be added**: Integration tests for changes in #562 **Why is this needed**: Improving test coverage
**What would you like to be added**: Add support for feature gates in JobSet, like are used in upstream k8s and Kueue [here](https://github.com/kubernetes-sigs/kueue/blob/main/pkg/features/kube_features.go). **Why is this needed**: - Allow customer...