Bihan Rana

Results 16 issues of Bihan Rana

### Steps to reproduce Step 1: Specify fix disk in .dstack.yml eg: ``` type: dev-environment # Use either `python` or `image` to configure environment python: "3.11" # image: ghcr.io/huggingface/text-generation-inference:latest ide:...

bug

### Steps to reproduce This PR [1119](https://github.com/dstackai/dstack/pull/1119) provides spot provisioning for Runpod. Provisioning with `dstack run . -b runpod --gpu 1 --spot --retry` would re-submit the run if pod is...

bug

### Problem Runpod offers two types of cloud [Secure, Community]. Community cloud offers machines in cheaper price than Secure cloud. Currently, gpuhunt aggregates and lists offers from both Secure and...

feature

### 🚀 The feature, motivation and pitch The current installation process for vLLM on AMD devices presents significant challenges in terms of installation time: 1. The ROCm Docker image is...

feature request

### System Info System Info TGI Docker Image: ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm MODEL: meta-llama/Llama-3.1-405B-Instruct-FP8 Hardware used: Intel® Xeon® Platinum 8470 2G, 52C/104T, 16GT/s, 105M Cache, Turbo, HT (350W) [x2] AMD MI300X GPU OAM...

### System Info TGI Docker Image:` ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm` MODEL: meta-llama/Llama-3.1-405B-Instruct Hardware used: Intel® Xeon® Platinum 8470 2G, 52C/104T, 16GT/s, 105M Cache, Turbo, HT (350W) [x2] AMD MI300X GPU OAM 192GB 750W...

Does trtllm-serve enables prefix caching automatically ? I want to serve Deepseek-R1 with prefix caching enabled. I am deploying as follow: ``` trtllm-serve --backend pytorch --max_batch_size $MAX_BATCH_SIZE --max_num_tokens $MAX_NUM_TOKENS --max_seq_len...

triaged

**Description:** While following the official [multi-node training tutorial for AMD clusters](https://verl.readthedocs.io/en/latest/start/multinode.html#multi-node-training-on-amd-clusters), we encountered issues when using RCCL with RoCE RDMA for communication. Training only works with the GLOO backend, and...

AMD