PainlessInferenceAcceleration icon indicating copy to clipboard operation
PainlessInferenceAcceleration copied to clipboard

modeling_qwen attention not use multi branch position ids & attention_mask

Open snippetzero opened this issue 8 months ago • 1 comments

I reviewed the code of modeling_qwen.py, and I noticed that, within the lookahead process, the draft_ids matched from the TrieTree are such that the attention_mask and position ids associated with these draft_ids are not being utilized in the attention mechanism. This, I believe, might be an implementation error. Could you please point out where my understanding is incorrect?

snippetzero avatar Jun 05 '24 03:06 snippetzero