Jee Jee Li

Results 248 comments of Jee Jee Li

> > See: > > ``` > > ValueError: Prequant BitsAndBytes models with TP is not supported.Please try with PP. > > ``` > > thanks for the response, may...

IIUC, although this PR is related to LoRA loading, it seems you haven't touched the underlying LORA logic. What you might need is to add unit tests similar to #6566....

See: [Llama-3_1-Nemotron-Ultra-253B-v1](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#use-it-with-vllm)

have you tried deleting `--enforce-eager`?

In the recent main branch, I used [Llama-3_1-Nemotron-Ultra-253B-v1](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#use-it-with-vllm), and just by removing `--enforce-eager`, the service started normally.