Frank Mai
Frank Mai
for reasoning models, it's better to increase the context size by `--ctx-size` or reduce the parallelism by `--threads-http`.
First of all, confirm whether this also affects Windows: After testing, box ROCm 6.1 works well on Windows.
we can bump the ROCm version of llama-box to 6.2.4 to fix this.
please try with the following package.
https://github.com/ROCm/ROCK-Kernel-Driver/issues/153, after testing, with v6.2.4 and `GPU_MAX_HW_QUEUES=1`, this issue is gone. https://rocm.docs.amd.com/projects/HIP/en/docs-develop/reference/env_variables.html
after some digging, the issue still exists. #### host info: 6.2.4 ```shell $ rocminfo --support ROCk module version 6.8.5 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.14 Runtime...
@Finenyaco we can verify with v0.0.105, and deploy one card within 5 instances. keep this issue open, do not close it.
under ascend RC mode, llama-box cannot allocate memory with Virtual Memory Management, instead, llama-box uses Buffer Memory Management as fallback. however, before v0.0.121(included), Buffer Memory Management easily caused memory leaking,...
please test with 0.0.127, using --warmup to observe the trash output.
screenshot of `dmesg -T | grep amdgpu` it seems this issue happened when those log reports were made.