EAGLE
EAGLE copied to clipboard
Request for Training Code and Feature Fusion Details in EAGLE-3
Hi, I have a few questions and requests related to EAGLE-3 training and feature fusion implementation.
-
Could you share the data generation code used to build the training dataset for EAGLE-3?
-
Regarding feature fusion, the paper mentions using low,mid, and high-level hidden states from the decoder.
- I assume the high-level feature refers to the final decoder layer (right before the LM head).
- Could you clarify which specific decoder layers are used for the low and mid-level features?
- It seems the current
train/main.pydoes not include the updated loss function for EAGLE-3. Would it be possible to share the full training script or main.py used for training EAGLE-3?
Thanks,
for idx, decoder_layer in enumerate(self.layers): if idx==len(self.layers)-3 or idx==len(self.layers)//2 or idx==2: all_hidden_states += (hidden_states,)
EAGLE/eagle/model /modeling_llama_kv.py: line 1138