Frank Mai comments

Results 51 comments of


                                            Frank Mai

Failed to deploy model on rocm6.4-sglang0.5.5.post3

It seems SGLang do not support gfx1101, see https://github.com/ROCm/aiter/blob/17a25514a1c1294b193eef984089c780e0bf53cf/aiter/jit/utils/chip_info.py#L11-L21, https://github.com/ROCm/aiter/blob/17a25514a1c1294b193eef984089c780e0bf53cf/csrc/cpp_itfs/utils.py#L117-L123. Even configure `HSA_OVERRIDE_GFX_VERSION=11.0.0` to mock gfx1100, I can still encounter an error as below.

SGlang and llama.cpp custom integration ignore gpu choice

@travelyoga You should also provide the vLLM running container's spec with `docker inspect `.

SGlang and llama.cpp custom integration ignore gpu choice

The running container has already been injected with the correct visiable devices. The first RTX A5000 device has been recognized and loaded model weight. The root cause is as below....

Audio files cannot be completely recognized by multimodal models

I have tested in [Qwen.ai](https://chat.qwen.ai/) twice, and I found that the output of Qwen is stable: the response matches the issue's screenshot. The recognition processing is usually affected by samplers,...

Devices with arch_family of Ascend910B cannot automatically select the corresponding vLLM image.

~Fix by https://github.com/gpustack/runtime/commit/ab869fffc6ae49df3138d3662280b461e828d194, this should be included in later release.~ After comparing some information, it was found that `910B` is not the Ascend 910B series, it should belong to Ascend...

Support ollama [non-OCI] registry pulling

I am very confused about why Ollama doesn't use OCI standards to store its models. So I created an alternative to find more answers. https://github.com/gpustack/gguf-packer-go

CANN error when running DeepSeek-R1-Distill-Qwen-32B-GGUF-Q8_0

@lamhktommy can you test this with v0.0.122 ?

CANN error when running DeepSeek-R1-Distill-Qwen-32B-GGUF-Q8_0

> [@lamhktommy](https://github.com/lamhktommy) can you test this with v0.0.122 ? within a further test, v0.0.122(built with Ascend 8.0.rc2.alpha003) is still crashing in a large context size, we have released v0.0.123(built with...

CANN error when running DeepSeek-R1-Distill-Qwen-32B-GGUF-Q8_0

the mul_mat of Q8_0 implementation is limitated at present, let's move out v0.6.0, and figure this out later.

CANN error when running DeepSeek-R1-Distill-Qwen-32B-GGUF-Q8_0

as a workaround, we suggest to use fp16 instead.