Jee Jee Li
Jee Jee Li
> > See: > > ``` > > ValueError: Prequant BitsAndBytes models with TP is not supported.Please try with PP. > > ``` > > thanks for the response, may...
FYI: #https://github.com/vllm-project/vllm/pull/6140
@jdf-prog #6140 has addressed this issue, you can update the vllm version to try it out
IIUC, although this PR is related to LoRA loading, it seems you haven't touched the underlying LORA logic. What you might need is to add unit tests similar to #6566....
See: [Llama-3_1-Nemotron-Ultra-253B-v1](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#use-it-with-vllm)
have you tried deleting `--enforce-eager`?
In the recent main branch, I used [Llama-3_1-Nemotron-Ultra-253B-v1](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#use-it-with-vllm), and just by removing `--enforce-eager`, the service started normally.
which vllm version are you using?
Welcome to contribute