Simon Mo
Simon Mo
I think this can be a good idea. Are you thinking about offline evaluation using the `LLM` interface or the server? Thoughts about this? @Yard1 @zhuohan123
@GeauxEric please feel free to open a PR so it's easier to get feedback.
This script can help verify this works end to end https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py
> My previous comment is about truncation side, as for various reasons/formats we'd either want to trim from the left or right as well and since it already a parameter...
I trust @njhill to decide and merge.
Doing tp=4 is the most effective fix.
@architkulkarni
We now support full range of constrained/guided decoding as powered by Outlines, closing this as completed
If anyone have bandwidth to help us implement ChatGLM support, please leave a comment and coordinate here: https://github.com/vllm-project/vllm/issues/1552