ryanaoleary

Results 9 issues of ryanaoleary

### Search before asking - [X] I had searched in the [issues](https://github.com/ray-project/kuberay/issues) and found no similar feature requirement. ### Description In order to support TPU multi-host with Kuberay, it is...

enhancement
tpu

This PR moves the Ray TPU webhook from the `applications/ray` folder (reserved for terraform templates) to a new `ray-on-gke/tpu` folder. This PR contains no changes to the webhook code.

## Why are these changes needed? Adds support for Ray autoscaler and Kuberay NodeProvider to scale-down TPU podslices. TPU podslices are atomic, so it is necessary to scale down all...

kuberay
P1.5

## Why are these changes needed? Add documentation for users seeking to use Kuberay with TPUs on GKE, similar to the existing documentation for [GPUs](https://github.com/ray-project/ray/blob/master/doc/source/cluster/kubernetes/user-guides/gcp-gke-gpu-cluster.md). This PR depends on example...

I'm receiving an error when attempting to run: ``` ray job submit -- python run_ray_serve_interleave.py --tpu_chips=4 --num_hosts=1 --size=8B --model_name=llama-3 --batch_size=8 --max_cache_length=2048 --tokenizer_path=$tokenizer_path --checkpoint_path=$output_ckpt_dir --quantize_weights=True --quantize_type="int8_per_channel" --quantize_kv_cache=True --sharding_config="default_shardings/llama.yaml" ``` on a...

### Your current environment Collecting environment information... INFO 10-03 20:20:36 importing.py:10] Triton not installed; certain GPU-related functions will not be available. PyTorch version: 2.5.0 Is debug build: False CUDA used...

usage

## Why are these changes needed? This PR adds recommended fields to the v6e-256 RayCluster and RayJob sample manifests. For the larger slice size, adding `privileged: true` resolves a `UNKNOWN:...

## Why are these changes needed? This PR adds a fake TPU test case, similar to the existing fake GPU test case for autoscaling, that uses detached actors to verify...

1.3.0

# Description This PR adds a simple inference script to be used for a Ray multi-host TPU example serving Meta-Llama-3-70B. Similar to the other scripts in the /llm/ folder, `serve_tpu.py`...