ryanaoleary issues

Results 9 issues of


                                            ryanaoleary

[Feature] Kuberay Operator TPU Worker Group Support

### Search before asking - [X] I had searched in the [issues](https://github.com/ray-project/kuberay/issues) and found no similar feature requirement. ### Description In order to support TPU multi-host with Kuberay, it is...

enhancement

tpu

Move Ray TPU Webhook out of applications/ray folder

This PR moves the Ray TPU webhook from the `applications/ray` folder (reserved for terraform templates) to a new `ray-on-gke/tpu` folder. This PR contains no changes to the webhook code.

[Core] Enable Scaling Down for Multi-Host TPU Replicas

## Why are these changes needed? Adds support for Ray autoscaler and Kuberay NodeProvider to scale-down TPU podslices. TPU podslices are atomic, so it is necessary to scale down all...

kuberay

P1.5

[Docs][Kuberay] Documentation for Using Kuberay with TPUs

## Why are these changes needed? Add documentation for users seeking to use Kuberay with TPUs on GKE, similar to the existing documentation for [GPUs](https://github.com/ray-project/ray/blob/master/doc/source/cluster/kubernetes/user-guides/gcp-gke-gpu-cluster.md). This PR depends on example...

Error Running `run_ray_serve_interleave` with Llama3 8B

I'm receiving an error when attempting to run: ``` ray job submit -- python run_ray_serve_interleave.py --tpu_chips=4 --num_hosts=1 --size=8B --model_name=llama-3 --batch_size=8 --max_cache_length=2048 --tokenizer_path=$tokenizer_path --checkpoint_path=$output_ckpt_dir --quantize_weights=True --quantize_type="int8_per_channel" --quantize_kv_cache=True --sharding_config="default_shardings/llama.yaml" ``` on a...

ryanaoleary

[Feature] Kuberay Operator TPU Worker Group Support

Move Ray TPU Webhook out of applications/ray folder

[Core] Enable Scaling Down for Multi-Host TPU Replicas

[Docs][Kuberay] Documentation for Using Kuberay with TPUs

Error Running `run_ray_serve_interleave` with Llama3 8B

[Usage]: LLama-3.1-405B Inference with vLLM TPU

Update v6e-256 KubeRay Sample

Add Fake TPU e2e Autoscaling Test Cases

Add RayService vLLM TPU Inference script