Jee Jee Li comments

Results 248 comments of


                                            Jee Jee Li

[Bug]: Multi-GPU Support for Quantized Models in vLLM

> > See: > > ``` > > ValueError: Prequant BitsAndBytes models with TP is not supported.Please try with PP. > > ``` > > thanks for the response, may...

[Bug]: deepseek-coder-v2-lite-instruct; Exception in worker VllmWorkerProcess while processing method initialize_cache: [Errno 2] No such file or directory: '/root/.triton/cache/de758c429c9ff1f18930bbd9c3004506/fused_moe_kernel.json.tmp.pid_1528_587007', Traceback (most recent call last):

FYI: #https://github.com/vllm-project/vllm/pull/6140

[Bug]: deepseek-coder-v2-lite-instruct; Exception in worker VllmWorkerProcess while processing method initialize_cache: [Errno 2] No such file or directory: '/root/.triton/cache/de758c429c9ff1f18930bbd9c3004506/fused_moe_kernel.json.tmp.pid_1528_587007', Traceback (most recent call last):

@jdf-prog #6140 has addressed this issue, you can update the vllm version to try it out

[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server

IIUC, although this PR is related to LoRA loading, it seems you haven't touched the underlying LORA logic. What you might need is to add unit tests similar to #6566....

[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server

@joerunde WDYT

[New Model]: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1

See: [Llama-3_1-Nemotron-Ultra-253B-v1](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#use-it-with-vllm)

[New Model]: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1

have you tried deleting `--enforce-eager`?

[New Model]: nvidia/Llama-3_1-Nemotron-Ultra-253B-v1

In the recent main branch, I used [Llama-3_1-Nemotron-Ultra-253B-v1](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#use-it-with-vllm), and just by removing `--enforce-eager`, the service started normally.

[Bug]: Failure to Init Qwen2VL-2B-Instruct with tensor-parallel-size > 1 and quantization

which vllm version are you using?

[Feature]: Support Lora for Beam Search

Welcome to contribute