LLM-VM
LLM-VM copied to clipboard
add parallel sampling using vllm
close #370
adds support for parallel sampling using vllm library when num_return_sequences in generation kwargs is > 1 and the model is supported by vllm (currently all hf models in llm-vm)
TODO: handle dependencies
made suggested changes. vllm_support is set to true by default and needs to be set false explicitly for unsupported models.