ray Doc: Explain how to use a RayJob with Kueue and ProvisioningRequest despite's GKE single PodSet limitation

Description

The documentation explains how to run RayJobs with Kueue and queued provisioning on GKE. The documented manifests only work when the RayJob only has a head node but no workers. If one adds workers, GKE rejects the ProvisioningRequest because it only supports a single PodSet per request currently.

This PR documents how to circumvent this issue.

Related issues

Closes #57839

Additional information

Created feature request to allow multiple podsets in GKE's issue tracker https://issuetracker.google.com/issues/452882313

Nov 29 '25 08:11 fg91

https://github.com/ray-project/ray/pull/59068 seems to be an incorrect reference.

Dec 01 '25 17:12 aslonnie

https://github.com/ray-project/ray/pull/59068 seems to be an incorrect reference.

Thanks for the catch, the id was from the pr template. Fixed.

Dec 01 '25 17:12 fg91

cc @andrewsykim to review if you have time, thank you!

Dec 02 '25 16:12 Future-Outlier

The documented manifests only work when the RayJob only has a head node but no workers.

@fg91 I don't think this is true, assuming the Head pod doesn't request GPUs. But let me know if you see otherwise

Dec 02 '25 20:12 andrewsykim

The documented manifests only work when the RayJob only has a head node but no workers.

@fg91 I don't think this is true, assuming the Head pod doesn't request GPUs. But let me know if you see otherwise

When I configure a head node without a GPU and a worker with a GPU, I see the error message mentioned in the linked issue:


Error creating ProvisioningRequest "rayjob-rayjob-sleep-test-062a5-dws-prov-1": admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
      Violations details: {"[denied by provisioning-request-cr-validation]":["the queued provisioning feature currently supports only single PodSet per request"]}

Dec 02 '25 22:12 fg91

@fg91 thanks, let me share this internally at Google and see if it's expected behavior. Can you share the output of your ProvisiongRequest?

kubectl get provisioningrequest rayjob-rayjob-sleep-test-062a5-dws-prov-1 -o yaml

Dec 03 '25 16:12 andrewsykim

@fg91 @andrewsykim any progress here? I don't consider my comment as a blocker by any means.

Dec 10 '25 10:12 mimowo

Andrew is on vacation

Dec 10 '25 14:12 Future-Outlier

@fg91 regarding this comment https://github.com/ray-project/ray/pull/59070#issuecomment-3604268162. actually when using ProvisioningRequest on GKE you should rather exclude CPU completele in the ProvisioningRequestConfig by using managedResources: nvidia.com/gpu as shown here: https://kueue.sigs.k8s.io/docs/concepts/admission_check/provisioning_request/#provisioningrequestconfig

Then the feature IdenticalWorkloadSchedulingRequirements is not meant for combining GPU and CPU PodSets. It is useful for combining PodSets using the same resource types. For example when the "head" PodSet is using GPU also.

cc @andrewsykim

Dec 10 '25 17:12 mimowo