vllm icon indicating copy to clipboard operation
vllm copied to clipboard

Allow passing hf config args with openai server

Open Aakash-kaushik opened this issue 1 year ago • 7 comments

Hi,

Is there a specific reason for why can't we allow passing of args from the openai server to the HF config class, there are very reasonable use cases where i would want to override the existing args in a config while running the model dynamically though the server.

reference line

simply allowing *args in the openai server that are passed to this while loading the model, i believe there are internal checks for failing if anything configured is wrong anyway.

supported documentation in the transformers library:

        >>> # Change some config attributes when loading a pretrained config.
        >>> config = AutoConfig.from_pretrained("bert-base-uncased", output_attentions=True, foo=False)
        >>> config.output_attentions
        True

Aakash-kaushik avatar Jan 22 '24 09:01 Aakash-kaushik

I believe there's no fundamental reason to this. Contribution welcomed! I would say you can add this to ModelConfig class and pass it through EngineArgs.

simon-mo avatar Jan 23 '24 20:01 simon-mo

I will take a look at this

KrishnaM251 avatar Jan 25 '24 21:01 KrishnaM251

Anyone has news about that? I want to use --dtype, but it doesn't work

mrPsycox avatar Feb 07 '24 17:02 mrPsycox

@mrPsycox —dtype is supported in vllm, please take a look at the engine args on vllm docs

Aakash-kaushik avatar Feb 07 '24 18:02 Aakash-kaushik

Thanks @Aakash-kaushik , I found the issue. Passing --dtype need to be in first args of the command, not in the last.

This works for me:

 run: |
   conda activate vllm
   python -m vllm.entrypoints.openai.api_server \
     --tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
     --dtype half \
     --host 0.0.0.0 --port 8080 \
     --model <model_ name>

mrPsycox avatar Feb 08 '24 09:02 mrPsycox

Just as a workaround, I am currently doing something like this:

import shutil
import os
from contextlib import contextmanager

@contextmanager
def swap_files(file1, file2):
    try:
        
        temp_file1 = file1 + '.temp'
        temp_file2 = file2 + '.temp'
            
        print("Renaming Files.")
        os.rename(file1, temp_file1)
        os.rename(file2, file1)
        os.rename(temp_file1, file2)
        
        yield
        
    finally:
        print("Restoring Files.")
        os.rename(file2, temp_file2)
        os.rename(file1, file2)
        os.rename(temp_file2, file1)

file1 = '/path/to/original/config.json'
file2 = '/path/to/modified/config.json'

with swap_files(file1, file2):
    llm = LLM(...)

timbmg avatar Apr 30 '24 15:04 timbmg

I would love to see this as well

K-Mistele avatar May 06 '24 02:05 K-Mistele

@Aakash-kaushik @mrPsycox @timbmg @K-Mistele

Please take a look at my PR and let me know if it serves your purpose.

As @DarkLight1337 noted in my PR (#5836) , what exactly do you want to accomplish using this feature that cannot otherwise be done via vLLM args? (If we don't have any situation that results in different vLLM output, what is the point of enabling this?)

Once you get back to me, I'll write a test that covers that case.

KrishnaM251 avatar Jun 27 '24 22:06 KrishnaM251