TensorRT-LLM
TensorRT-LLM copied to clipboard
How to use Medusa to support non llama models?
System Info
Hardware: L20 Version: 0.11.0.dev20240625 Model: Bloom7b1
Who can help?
@ncomly-nvidia @byshiue I have obtained the Medusa head for Bloom according to the official Medusa documentation, but during deployment, I need to modify bloom/model.py. I referenced llama/model.py to modify a version, but the accuracy is very poor. Therefore, I have two questions
- Does Medusa support deploying other models that are not llama classes?
- For other types of model. py, please provide reference Medusa official modification tips, like '[MODIFIED]' reference resources: https://github.com/FasterDecoding/Medusa/blob/main/medusa/model/modeling_llama_kv.py I mainly adapted the spec_decoding-params parameter in bloom/model.py
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
1、Medusa Head for Training Bloom Model 2、Adapted spec_decoding-params parameter in bloom/modl.py
Expected behavior
nothing
actual behavior
nothing
additional notes
nothing