Jee Jee Li comments

Results 209 comments of


                                            Jee Jee Li

[Bug]: The size of tensor a (49472) must match the size of tensor b (49664) at non-singleton dimension 1

@aldettinger Can you test if #18773 fix your issue?

[Feature][kernel] tensor parallelism with bitsandbytes quantization

> @mgoin @jeejeelee Could you help take a look at this PR which adds TP to bnb. > > Wonder whether you can give me a hand in the test...

[Bug]: Qwen2-VL GPTQ does not work

This might be the same issue as https://github.com/vllm-project/vllm/pull/8329

[Bug]: Qwen2-VL GPTQ does not work

> It's odd that Qwen2-VL-7B-Instruct-GPTQ-Int4 works while -GPTQ-Int8 does not. The addition of extra bias likely occurred during int8 quantization. I am now fixing this bug

[Bug]: RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

You can try commenting out or deleting : ```python 'device = "cuda" if torch.cuda.is_available() else "cpu" ```

[Bug]: Enabling LoRA not working with vLLM

Have you tested triton 3.2.0?

[Bugfix] Fix vllm_flash_attn rotary import

> Thanks. LGTM > > Can you also add a `Co-authored-by: Aaron Pham ` to the description. Done

[Bugfix] Fix vllm_flash_attn rotary import

> Ah we need to gated the copy ovee in `_is_cuda()` only here. > > ```diff > 27fdbeea7 - chore: only gated in CUDA (HEAD -> fix-flash-att-rotray) > > Signed-off-by:...

[Usage]: How to check the corresponding functionality of operators in Llama-2-7b-hf?

I remember you mentioned a similar issue a long time ago - has it still not been resolved?

[ Misc ] Expand Fp8 MoE Support to Qwen

@robertgshaw2-neuralmagic @comaniac There is a potential risk of illegal memory access, I have made changes but have not yet submitted them. Please refer to:[add_device_gurad](https://github.com/jeejeelee/vllm/blob/fix-moe-kernel/csrc/moe_align_block_size_kernels.cu#L115)