Jee Jee Li

Results 206 comments of Jee Jee Li

@ywang96 @HwwwwwwwH Sorry to bother you again, but could you please let me know the current progress of this PR?

> @jeejeelee 能在下一个版本中,适配一下算子? No suitable kernel.h_in=16 h_out=76032 dtype=Float out _dtype=Half No suitable kernel. h_in=16 h_out=55552 dtype=Float out_dtype=BFloat16 f(in_T, out_T, W_T, narrow, 55552) f(in_T, out_T, W_T, narrow, 76032 ) \ 这个是什么模型的尺寸呢,此外你可以自己提个PR来解决这个size

> > Current punica kernel can't process ` h_out=3424` , you can set `-tensor-parallel-size 2` to avoid this error > > @jeejeelee can you support this on 8 gpus? I'm...

> 能否帮忙适配呢,如果自己提PR,需要在哪一个开发分支上修改 可以去查下如何向github的工程提交PR

> > > > Current punica kernel can't process ` h_out=3424` , you can set `-tensor-parallel-size 2` to avoid this error > > > > > > > > >...

> > > 您好,我在加载baichuan2-13b时候遇到相似问题,在0.3.3与0.4版本均存在,RuntimeError: No suitable kernel. h_in=32 h_out=15360 dtype=Float out_dtype=BFloat16 > > > > > > 当前的vllm版本中,punica的算子不支持15360,我之前的PR没有注意到这点,不好意思。 您可以在 https://github.com/vllm-project/vllm/blob/main/csrc/punica/bgmv/bgmv_config.h#L48 添加 > > ```c++ > > f(in_T, out_T, W_T, narrow,...

> 您好,我在微调Qwen2-7B后部署时遇到了同样的问题: `[rank0]: Traceback (most recent call last): [rank0]: File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main [rank0]: return _run_code(code, main_globals, None, [rank0]: File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code [rank0]: exec(code, run_globals)...

@liangxiao777 FYI https://github.com/vllm-project/vllm/pull/5441

可以参考:https://github.com/vllm-project/vllm/pull/4087 拉取相关的分支进行测试

@chenqianfzh We cannot igore format error, you can run `bash format.sh` to check for format errors