Jee Jee Li

Results 206 comments of Jee Jee Li

> Strongly support this proposal. From an engineering perspective, prioritizing LoRA support for only the attention layers ('q_proj', 'k_proj', 'v_proj', 'o_proj') in the initial Qwen 3 MoE integration would be...

Upgrading to triton3.4 or downgrading to 3.2 can fix this issue

@DarkLight1337 Do you know what's causing the current CI failures?

> I really have no idea how to fix this. Any suggestions? > > Don't worry about this issue, committer can fix it directly.

> I can't get much info using CUDA_LAUNCH_BLOCKING=1 > > ``` > ERROR 08-16 04:15:24 async_llm_engine.py:53] File "/opt/vllm/vllm/engine/async_llm_engine.py", line 247, in step_async^M > ERROR 08-16 04:15:24 async_llm_engine.py:53] output = await...

> > Considering that our current MoE layer doesn't support LoRA yet, llama4 may not be able to fully support LoRA > > @jeejeelee , we probably need to ask...

It looks like the error is not related to LoRA. ```shell ERROR 02-10 15:42:31 engine.py:389] Following weights were not initialized from checkpoint: {'apm.layers.18.self_attn_layer_norm.weight', 'apm.layers.13.self_attn.v_proj.weight', 'apm.layers.0.fc2.bias', 'apm.layers.16.self_attn.q_proj.bias', 'apm.layers.8.self_attn.out_proj.weight', 'apm.layers.17.self_attn.v_proj.bias', 'apm.layers.3.fc1.bias', 'apm.layers.11.self_attn.q_proj.bias',...