Q-Align
Q-Align copied to clipboard
Questions about modeling_llama2
Hi! Thanks for your brilliant work! However, when I try to use Q-Align with Llama3 simultaneously in one python file, I find that following code in Q-Align scripts file "modeling_llama2.py" will change the codec of transformers and cause conflicts with Llama3 weight loading and inference process.
def replace_llama_modality_adaptive():
transformers.models.llama.configuration_llama.LlamaConfig = LlamaConfig
transformers.models.llama.modeling_llama.LlamaAttention = LlamaAttention
transformers.models.llama.modeling_llama.LlamaFlashAttention2 = LlamaFlashAttention2
transformers.models.llama.modeling_llama.LlamaSdpaAttention = LlamaSdpaAttention
transformers.models.llama.modeling_llama.LlamaDecoderLayer = LlamaDecoderLayer
transformers.models.llama.modeling_llama.LlamaModel.forward = model_forward
transformers.models.llama.modeling_llama.LlamaForCausalLM.forward = causal_model_forward
May you please refine these scripts to avoid direct change of transformers codec? Thanks!