qwen3 moe lora error
full finetune is ok, lora error:
(WorkerDict pid=502574) Exception in thread Thread-3 (_loop_forever): (WorkerDict pid=502574) Traceback (most recent call last): (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner (WorkerDict pid=502574) self.run() (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/threading.py", line 953, in run (WorkerDict pid=502574) self._target(*self._args, **self._kwargs) (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/site-packages/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py", line 453, in _loop_forever (WorkerDict pid=502574) result = self.execute_method(method, *args, **kwargs) (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/site-packages/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py", line 499, in execute_method (WorkerDict pid=502574) return self.wake_up(*args, **kwargs) (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/site-packages/verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py", line 488, in wake_up (WorkerDict pid=502574) self.sharding_manager.enter() # pylint: disable=C2801 (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/site-packages/verl/utils/profiler/performance.py", line 89, in f (WorkerDict pid=502574) return self.log(decorated_function, *args, **kwargs) (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/site-packages/verl/utils/profiler/performance.py", line 102, in log (WorkerDict pid=502574) output = func(*args, **kwargs) (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/site-packages/verl/workers/sharding_manager/fsdp_vllm.py", line 217, in enter (WorkerDict pid=502574) self.update_params(params, peft_config=peft_config) (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/site-packages/verl/workers/sharding_manager/fsdp_vllm.py", line 294, in update_params (WorkerDict pid=502574) self.inference_engine.llm_engine.add_lora(lora_reqest) (WorkerDict pid=502574) File "/media/hdd4tb/sankuai/env/sk_rl_3.10/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 626, in getattr (WorkerDict pid=502574) return getattr(self.worker, attr) (WorkerDict pid=502574) AttributeError: 'Worker' object has no attribute 'llm_engine'
find these at verl:
[Bug] verl 0.5.0 is incompatible with vLLM v1 API when using LoRA,https://github.com/volcengine/verl/issues/3271 Fix async LoRA support for vLLM v1 + FSDP,https://github.com/rllm-org/rllm/pull/239
I think some effort is needed to support latest verl as they moved the vllm inference server to agent loop. Will look into it.