Jack BAI
Jack BAI
Update: the error is still not resolved even if I use the **original config** downloaded from HF
> If this issue is still Open @BiEchi ? Yes, this problem is not resolved yet.
Did you pull the repo and compile yourself? If you're using the pip pre-built version, there might be some discrepancies.
Got it - @hmellor do you know anyone of the repo team that might be interested in this vllm bug?
I just got a minor update that if we pass enforce_eager to the V1 engine, then this problem will no longer remain, but the generation will get stuck at 80%...
@DarkLight1337 @WoosukKwon I also managed to run the saved model with huggingface. I believe this is a vLLM issue while loading the model. I suspect there's some differences between HF...
I am not observing this as of today for Gemma-3 with the latest version of vllm.