Bihan Rana
Bihan Rana
Thank you @maxweiss. I will try with `ROCm 6.4.0`
@maxweiss You are right with ROCm 6.4.0 it shows all PARTITION with valid values, but only devices (indices 0, 8, 16, 24, 32, 40, 48, 56) are attachable via Docker’s...
@maxweiss Once again Thank You. Yes this looks like just a display error. I ran vllm inference and it worked too. I also tried with `ROCm 6.4.1` and it worked...
To successfully run vLLM on the GH200, we followed these steps: ``` docker run --gpus all -it --rm --ipc=host nvcr.io/nvidia/pytorch:23.10-py3 # Inside the container $ pip3 install --pre torch torchvision...
@peterschmidt85 1. For Inference examples we do not need to build manually. (I will update) 2. For Fine-tuning (I will check and let you know) 3. Yes I will update...
> Currently, we require the user to specify `image` always when using AMD. It would be cool if we provide a small and up-to-date AMD image with ROCm drivers. @peterschmidt85...
[gateway_logs.txt](https://github.com/user-attachments/files/24261039/gateway_logs.txt) @jvstme Here is the gateway logs around that time
> These are the gateway logs about replica `23d3e9` that the server failed to register: > > ``` > Dec 19 13:03:19 ip-172-31-21-247 sh[29500]: 2025-12-19 13:03:19,068 - dstack._internal.proxy.gateway.services.registry - DEBUG...
Will be solving merge conflicts as review continues.
## Related PRs https://github.com/dstackai/dstack/pull/3205 from @DragonStuff