TensorRT-LLM How to use Medusa to support non llama models?

How to use Medusa to support non llama models?

Open skyCreateXian opened this issue 7 months ago • 8 comments

System Info

Hardware: L20 Version: 0.11.0.dev20240625 Model: Bloom7b1

Who can help?

@ncomly-nvidia @byshiue I have obtained the Medusa head for Bloom according to the official Medusa documentation, but during deployment, I need to modify bloom/model.py. I referenced llama/model.py to modify a version, but the accuracy is very poor. Therefore, I have two questions

Does Medusa support deploying other models that are not llama classes?
For other types of model. py, please provide reference Medusa official modification tips, like '[MODIFIED]' reference resources: https://github.com/FasterDecoding/Medusa/blob/main/medusa/model/modeling_llama_kv.py I mainly adapted the spec_decoding-params parameter in bloom/model.py

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

1、Medusa Head for Training Bloom Model 2、Adapted spec_decoding-params parameter in bloom/modl.py

Expected behavior

nothing

actual behavior

nothing

additional notes

nothing

Jul 15 '24 03:07 skyCreateXian

TensorRT-LLM TensorRT-LLM copied to clipboard

How to use Medusa to support non llama models?

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

TensorRT-LLM
TensorRT-LLM copied to clipboard