TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

How to use Medusa to support non llama models?

Open skyCreateXian opened this issue 7 months ago • 8 comments

System Info

Hardware: L20 Version: 0.11.0.dev20240625 Model: Bloom7b1

Who can help?

@ncomly-nvidia @byshiue I have obtained the Medusa head for Bloom according to the official Medusa documentation, but during deployment, I need to modify bloom/model.py. I referenced llama/model.py to modify a version, but the accuracy is very poor. Therefore, I have two questions

  1. Does Medusa support deploying other models that are not llama classes?
  2. For other types of model. py, please provide reference Medusa official modification tips, like '[MODIFIED]' reference resources: https://github.com/FasterDecoding/Medusa/blob/main/medusa/model/modeling_llama_kv.py I mainly adapted the spec_decoding-params parameter in bloom/model.py

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

1、Medusa Head for Training Bloom Model 2、Adapted spec_decoding-params parameter in bloom/modl.py

Expected behavior

nothing

actual behavior

nothing

additional notes

nothing

skyCreateXian avatar Jul 15 '24 03:07 skyCreateXian