Frank Mai comments

Results 51 comments of


                                            Frank Mai

[llama-box] Keep repeating answers with messy logic during concurrency

for reasoning models, it's better to increase the context size by `--ctx-size` or reduce the parallelism by `--threads-http`.

AMD GPU utilization reaches 100% when using llama-box as the backend

First of all, confirm whether this also affects Windows: After testing, box ROCm 6.1 works well on Windows.

AMD GPU utilization reaches 100% when using llama-box as the backend

we can bump the ROCm version of llama-box to 6.2.4 to fix this.

AMD GPU utilization reaches 100% when using llama-box as the backend

please try with the following package.

AMD GPU utilization reaches 100% when using llama-box as the backend

https://github.com/ROCm/ROCK-Kernel-Driver/issues/153, after testing, with v6.2.4 and `GPU_MAX_HW_QUEUES=1`, this issue is gone. https://rocm.docs.amd.com/projects/HIP/en/docs-develop/reference/env_variables.html

AMD GPU utilization reaches 100% when using llama-box as the backend

after some digging, the issue still exists. #### host info: 6.2.4 ```shell $ rocminfo --support ROCk module version 6.8.5 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.14 Runtime...

AMD GPU utilization reaches 100% when using llama-box as the backend

@Finenyaco we can verify with v0.0.105, and deploy one card within 5 instances. keep this issue open, do not close it.

Failed to decode and garbled outputs on 910B1

under ascend RC mode, llama-box cannot allocate memory with Virtual Memory Management, instead, llama-box uses Buffer Memory Management as fallback. however, before v0.0.121(included), Buffer Memory Management easily caused memory leaking,...

Failed to decode and garbled outputs on 910B1

please test with 0.0.127, using --warmup to observe the trash output.

ROCM: Failed to deploy embedding/reranker models

screenshot of `dmesg -T | grep amdgpu` it seems this issue happened when those log reports were made.