Jee Jee Li

Results 209 comments of Jee Jee Li

> @mgoin @jeejeelee Could you help take a look at this PR which adds TP to bnb. > > Wonder whether you can give me a hand in the test...

This might be the same issue as https://github.com/vllm-project/vllm/pull/8329

> It's odd that Qwen2-VL-7B-Instruct-GPTQ-Int4 works while -GPTQ-Int8 does not. The addition of extra bias likely occurred during int8 quantization. I am now fixing this bug

You can try commenting out or deleting : ```python 'device = "cuda" if torch.cuda.is_available() else "cpu" ```

Have you tested triton 3.2.0?

> Thanks. LGTM > > Can you also add a `Co-authored-by: Aaron Pham ` to the description. Done

> Ah we need to gated the copy ovee in `_is_cuda()` only here. > > ```diff > 27fdbeea7 - chore: only gated in CUDA (HEAD -> fix-flash-att-rotray) > > Signed-off-by:...

I remember you mentioned a similar issue a long time ago - has it still not been resolved?

@robertgshaw2-neuralmagic @comaniac There is a potential risk of illegal memory access, I have made changes but have not yet submitted them. Please refer to:[add_device_gurad](https://github.com/jeejeelee/vllm/blob/fix-moe-kernel/csrc/moe_align_block_size_kernels.cu#L115)