WeClone icon indicating copy to clipboard operation
WeClone copied to clipboard

wsl2与vllm兼容问题

Open badwoman0 opened this issue 7 months ago • 1 comments

是不是wsl2和vllm有很多兼容性问题?我把 weclone/core/inference/vllm_infer.py 中的 "max_model_len": cutoff_len + max_new_tokens,尽可能改的比较低了,模型也选的是1.5b的,按理来说在显存8g的显卡上用起来没问题,但是老是在kv cache的时候爆显存,有人知道咋回事吗

badwoman0 avatar May 15 '25 02:05 badwoman0

是推理过程中爆了吗,把前缀缓存关了试一下? https://github.com/xming521/WeClone/blob/4c383d3c957313124217f864a9187d54ec09b747/weclone/core/inference/vllm_infer.py#L120C10-L120C31

xming521 avatar May 15 '25 06:05 xming521