Frank Mai comments

Results 51 comments of


                                            Frank Mai

Failed to deploy DeepSeek-R1-0528-W8A8 with vllm-ascend

After some code review, this might be caused by a model configuration missing: `quant_model_description.json`. This file is used for guiding each layer's quantization method. Can we download a model including...

Failed to deploy DeepSeek-R1-0528-W8A8 with vllm-ascend

Modify `quant_model_description.json` file, remove the top field `model_quant_type: "W8A8"`, and then add files on bottom as below:

Failed to deploy DeepSeek-R1-0528-W8A8 with vllm-ascend

According to https://docs.vllm.ai/projects/ascend/en/latest/tutorials/DeepSeek-R1.html, vLLM Ascend needs across-node DP. GPUStack haven't supported this yet.

Failed to deploy DeepSeek-R1-0528-W8A8 with vllm-ascend

Use MindIE instead, see https://github.com/gpustack/gpustack/issues/3295.

Distributed inference fails without data parallelism in MindIE 2.2.RC1

Same as https://www.hiascend.com/forum/thread-0231199593598695282-1-1.html.

Distributed inference fails without data parallelism in MindIE 2.2.RC1

Force pull the new image to have a try. According to https://www.hiascend.com/forum/thread-02121201412224061582-1-1.html, this could be an incorrect configuration in ATB config by default.

Inference Backend Updates

gpustack-runner v0.1.21.post1

Performance differences between Ollama and gpustack when running embedding model

#### 1. Preparation Adjust the script provided by @gitlawr with the following changes: - Add a warmup round to eliminate boundary impact - Parameterize `base url`, `model name` and `words...

Ascend 310p npu detection issue after running for a while in a x86_64 CPU environment

there is too little useful information. it's better to provide the whole log of the model instance.

Ascend 310p npu detection issue after running for a while in a x86_64 CPU environment

Actually, we release both amd64 and arm64 Docker images. Regarding this issue, we cannot discuss it further without a log.