vllm How integrate with hf with minial modification?

I saw all are wrappers around vllm, how to integrate hf and see out-of-box boost from my existing model?

Jun 21 '23 03:06 lucasjinreal

Thanks for your interest and great question! You can install vLLM from source and directly modify the model code.

Jun 21 '23 03:06 WoosukKwon

This is a huge change, is there any easier way to do with llama? I dont want insert these code to my transformers based existing project.

Jun 21 '23 05:06 lucasjinreal

Thanks for your interest and great question! You can install vLLM from source and directly modify the model code.

Can you guys point out in the documentation which are the necessary modifications? Or give a tutorial on the modification steps of a model.

"Rewrite the forward methods" section in the document is too brief.

Jun 21 '23 07:06 liujuncn

@lucasjinreal Is your model different from the original LLaMA? If not, you can simply pass the path to your model weights in llm = LLM(model=<path to your model>) and use the llm object and its generate method in your code.

Jun 21 '23 08:06 WoosukKwon

@liujuncn Thanks for your feedback. We'll describe more details in the doc. In order to address your issue quickly, could you share with us the specific model you're interested in using with vLLM? Depending on the model architecture, we might be able to incorporate support for it promptly.

Jun 21 '23 08:06 WoosukKwon

@WoosukKwon Can u be more specific?

Like I have hf based

model = AutoModelForCausalLM.from_pretrained(
            # base_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True
            base_model_path,
            low_cpu_mem_usage=True,
            torch_dtype=torch.float16,
            trust_remote_code=True,
            load_in_8bit=load_in_8bit,
            device_map="auto",
        )

How can I specific the from_pretrained and possibely specific the params here? Does the weights same as vllm? How can I specific the optimization method fp16 or bf16 etc?

And my generate loop was with stream, does it supported?

Jun 21 '23 11:06 lucasjinreal

@liujuncn Thanks for your feedback. We'll describe more details in the doc. In order to address your issue quickly, could you share with us the specific model you're interested in using with vLLM? Depending on the model architecture, we might be able to incorporate support for it promptly.

For example x-transformers here: https://github.com/lucidrains/x-transformers

We can choose to combine different tricks. So how would a custom model architecture be possible using vLLM？

Jun 22 '23 09:06 liujuncn

vllm vllm copied to clipboard

How integrate with hf with minial modification?

vllm
vllm copied to clipboard