PainlessInferenceAcceleration
PainlessInferenceAcceleration copied to clipboard
modeling_qwen attention not use multi branch position ids & attention_mask
I reviewed the code of modeling_qwen.py, and I noticed that, within the lookahead process, the draft_ids matched from the TrieTree are such that the attention_mask and position ids associated with these draft_ids are not being utilized in the attention mechanism. This, I believe, might be an implementation error. Could you please point out where my understanding is incorrect?