Jee Jee Li
Jee Jee Li
Yes, no problem. So, the dockerfile provides `modelscope`. And for other deployment way, error messages guide users to install `modelscope`. We can capture the ImportError similar to how it's done...
I don't have any bias, I'm just describing the current situation. @DarkLight1337 WDYT
This is due to QWQ chat template contains ``, as shown in your first cut image
which version are you using? It looks like your version is outdated.
I can generate reasonable results by using the latest main branch with `gemma-2-9b`. I think you can upgrade vllm to 0.7.3, then try it again
I will try to reproduce your results.
I cannot reproduce the result you reported. I suspect it might be due to the influence of prefix caching. The impact of LoRA(just using `--enable-lora`) is not very significant. 
@badrjd I didn't use the above image for testing. I built it locally based on the main branch. Maybe you could try that.
> the same problem @cmccxll Are you also using the above image for testing?
@badrjd Has this issue been resolved for you? If not, you can try adding `--max-seq-len-to-capture 48000`.It's most likely due to this reason