Frank Mai
Frank Mai
After some code review, this might be caused by a model configuration missing: `quant_model_description.json`. This file is used for guiding each layer's quantization method. Can we download a model including...
Modify `quant_model_description.json` file, remove the top field `model_quant_type: "W8A8"`, and then add files on bottom as below:
According to https://docs.vllm.ai/projects/ascend/en/latest/tutorials/DeepSeek-R1.html, vLLM Ascend needs across-node DP. GPUStack haven't supported this yet.
Use MindIE instead, see https://github.com/gpustack/gpustack/issues/3295.
Same as https://www.hiascend.com/forum/thread-0231199593598695282-1-1.html.
Force pull the new image to have a try. According to https://www.hiascend.com/forum/thread-02121201412224061582-1-1.html, this could be an incorrect configuration in ATB config by default.
gpustack-runner v0.1.21.post1
#### 1. Preparation Adjust the script provided by @gitlawr with the following changes: - Add a warmup round to eliminate boundary impact - Parameterize `base url`, `model name` and `words...
there is too little useful information. it's better to provide the whole log of the model instance.
Actually, we release both amd64 and arm64 Docker images. Regarding this issue, we cannot discuss it further without a log.