vllm
vllm copied to clipboard
Allow passing hf config args with openai server
Hi,
Is there a specific reason for why can't we allow passing of args from the openai server to the HF config class, there are very reasonable use cases where i would want to override the existing args in a config while running the model dynamically though the server.
simply allowing *args
in the openai server that are passed to this while loading the model, i believe there are internal checks for failing if anything configured is wrong anyway.
supported documentation in the transformers library:
>>> # Change some config attributes when loading a pretrained config.
>>> config = AutoConfig.from_pretrained("bert-base-uncased", output_attentions=True, foo=False)
>>> config.output_attentions
True
I believe there's no fundamental reason to this. Contribution welcomed! I would say you can add this to ModelConfig class and pass it through EngineArgs.
I will take a look at this
Anyone has news about that? I want to use --dtype, but it doesn't work
@mrPsycox —dtype
is supported in vllm, please take a look at the engine args on vllm docs
Thanks @Aakash-kaushik , I found the issue. Passing --dtype
need to be in first args of the command, not in the last.
This works for me:
run: |
conda activate vllm
python -m vllm.entrypoints.openai.api_server \
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
--dtype half \
--host 0.0.0.0 --port 8080 \
--model <model_ name>
Just as a workaround, I am currently doing something like this:
import shutil
import os
from contextlib import contextmanager
@contextmanager
def swap_files(file1, file2):
try:
temp_file1 = file1 + '.temp'
temp_file2 = file2 + '.temp'
print("Renaming Files.")
os.rename(file1, temp_file1)
os.rename(file2, file1)
os.rename(temp_file1, file2)
yield
finally:
print("Restoring Files.")
os.rename(file2, temp_file2)
os.rename(file1, file2)
os.rename(temp_file2, file1)
file1 = '/path/to/original/config.json'
file2 = '/path/to/modified/config.json'
with swap_files(file1, file2):
llm = LLM(...)
I would love to see this as well
@Aakash-kaushik @mrPsycox @timbmg @K-Mistele
Please take a look at my PR and let me know if it serves your purpose.
As @DarkLight1337 noted in my PR (#5836) , what exactly do you want to accomplish using this feature that cannot otherwise be done via vLLM args? (If we don't have any situation that results in different vLLM output, what is the point of enabling this?)
Once you get back to me, I'll write a test that covers that case.