paolovic comments

Results 51 comments of


                                            paolovic

[Bug]: VLLM 0.8.2 OOM error (No error in 0.7.3 version)

Hi @manitadayon , is it possible that you experienced your OOM error while computing the CUDA graph? Because `enforce_eager=True` is a way to circumvent this particular OOM during CUDA graph...

[Bug]: VLLM 0.8.2 OOM error (No error in 0.7.3 version)

alright, I'm quantizing the nvidia/Llama-3_3-Nemotron-Super-49B-v1 to 4bit GPTQ right now

[Bug]: VLLM 0.8.2 OOM error (No error in 0.7.3 version)

Hi @manitadayon , nice I was able to reproduce the error. Same machine, 2x Nvidia L40s, `vllm 0.8.3` 1. V0 works as follows: ```bash CUDA_VISIBLE_DEVICES=0 VLLM_USE_V1=0 vllm serve Llama-3_3-Nemotron-Super-49B-v1-4bit-GPTQ/ --trust-remote-code...

[Bug]: VLLM 0.8.2 OOM error (No error in 0.7.3 version)

> Thank you. Oh, you are able to run the model on 1 GPU in V0 version (only 48GB memory)? (since you set the CUDA visible device to only 0)....

Missing datasets

+ 1

[Feature]: Support "required" option in tool_choice

i will look into this

Multi-Node Quantization using Ray?

Hoi @casper-hansen , alright, as soon as I have time for that, I will dig into it. First, I assume I'll have to master Ray. Thank you for the quick...

feat: add NumberedHeadingsPreprocessor

nice work, thank you @sapristi !

[Usage]: how to use openai compatible api to run GGUF model?

The same holds for me as also described in here https://github.com/vllm-project/vllm/issues/4416 When trying to load a GGUF model, e.g., https://huggingface.co/bartowski/reader-lm-1.5b-GGUF , vLLM requires a `config.json` although the new (?) GGUF...

[Usage]: how to use openai compatible api to run GGUF model?

> Hey @paolovic, > > Yes, this error occurs because vLLM is currently not looking for `.gguf` files inside the folder but instead assumes you pass the `model` as the...