Bihan Rana issues

Results 16 issues of


                                            Bihan Rana

[Bug]: vastai gpuhunt returns no offer for fixed disk size

### Steps to reproduce Step 1: Specify fix disk in .dstack.yml eg: ``` type: dev-environment # Use either `python` or `image` to configure environment python: "3.11" # image: ghcr.io/huggingface/text-generation-inference:latest ide:...

bug

[Bug]: Retry Policy not re-submitting run with Runpod spot provisioning.

### Steps to reproduce This PR [1119](https://github.com/dstackai/dstack/pull/1119) provides spot provisioning for Runpod. Provisioning with `dstack run . -b runpod --gpu 1 --spot --retry` would re-submit the run if pod is...

bug

[Feature]: Add new configuration parameter to support different types of clouds offered by Runpod

### Problem Runpod offers two types of cloud [Secure, Community]. Community cloud offers machines in cheaper price than Secure cloud. Currently, gpuhunt aggregates and lists offers from both Secure and...

feature

[Feature]: Official ROCm Binary to Speed Up vLLM Installation

### 🚀 The feature, motivation and pitch The current installation process for vLLM on AMD devices presents significant challenges in terms of installation time: 1. The ROCm Docker image is...

feature request

TGI does not support FP8 quantized models on ROCm

### System Info System Info TGI Docker Image: ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm MODEL: meta-llama/Llama-3.1-405B-Instruct-FP8 Hardware used: Intel® Xeon® Platinum 8470 2G, 52C/104T, 16GT/s, 105M Cache, Turbo, HT (350W) [x2] AMD MI300X GPU OAM...

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B

### System Info TGI Docker Image:` ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm` MODEL: meta-llama/Llama-3.1-405B-Instruct Hardware used: Intel® Xeon® Platinum 8470 2G, 52C/104T, 16GT/s, 105M Cache, Turbo, HT (350W) [x2] AMD MI300X GPU OAM 192GB 750W...

Does trtllm-serve enables prefix caching automatically with Deepseek-R1?

Does trtllm-serve enables prefix caching automatically ? I want to serve Deepseek-R1 with prefix caching enabled. I am deploying as follow: ``` trtllm-serve --backend pytorch --max_batch_size $MAX_BATCH_SIZE --max_num_tokens $MAX_NUM_TOKENS --max_seq_len...

triaged

Multi-node training fails with RCCL + RoCE RDMA on AMD cluster, only GLOO backend works

**Description:** While following the official [multi-node training tutorial for AMD clusters](https://verl.readthedocs.io/en/latest/start/multinode.html#multi-node-training-on-amd-clusters), we encountered issues when using RCCL with RoCE RDMA for communication. Training only works with the GLOO backend, and...

AMD

Bihan Rana

Support gateway for Cudo

Add spot in runpod