PainlessInferenceAcceleration issues

请问：batch_indices 是什么含义？ LookaheadCache 的put、stream_put函数的最后一个参数ids是什么含义？ class Tree(): 里面的idx代表什么含义？

请问：batch_indices 是什么含义？ LookaheadCache 的put、stream_put函数的最后一个参数idx是什么含义？ mode和 idx 直接有关系吗？为啥mode='input', idx = 0,1,2,3.. batch的值，为啥mode='output', idx=-1 ? https://github.com/alipay/PainlessInferenceAcceleration/blob/8015f12f7fe32acc102bb3eb51c4f8b3a420e79c/pia/lookahead/common/pretrained_model_batch.py#L1254-L1259 ```python def put(self, token_ids, branch_length=8, final=False, mode='output', idx=0): ``` 为什么 `idx is only used for...

handsome-chips

stop_ids does not seem to be taking effect？

When I manually set StoppingCriteria， Lookahead can only generate one token.

jianyuheng

modeling_qwen attention not use multi branch position ids & attention_mask

1

I reviewed the code of modeling_qwen.py, and I noticed that, within the lookahead process, the draft_ids matched from the TrieTree are such that the attention_mask and position ids associated with...

snippetzero

Generation of Draft tokens and Trie tree creation

Hi, can you please share the information about generation of draft tokens, which method are you using. And after this how are you utilizing for creating Trie tree. It will...

Vithulep

论文里看到Table 8 Inference Latency with Lookahead for vLLM.

请问vLLM+lookahead这部分代码有吗？还是要改这个代码，如果需要改，怎么改呢，才能结合vllm

zjjznw123

PainlessInferenceAcceleration
PainlessInferenceAcceleration copied to clipboard

Metadata

请问：batch_indices 是什么含义？ LookaheadCache 的put、stream_put函数的最后一个参数ids是什么含义？ class Tree(): 里面的idx代表什么含义？

stop_ids does not seem to be taking effect？

modeling_qwen attention not use multi branch position ids & attention_mask

Generation of Draft tokens and Trie tree creation

论文里看到Table 8 Inference Latency with Lookahead for vLLM.

← Metadata

Owner

Metadata

PainlessInferenceAcceleration PainlessInferenceAcceleration copied to clipboard

Metadata

请问：batch_indices 是什么含义？ LookaheadCache 的put、stream_put函数的最后一个参数ids是什么含义？ class Tree(): 里面的idx代表什么含义？

stop_ids does not seem to be taking effect？

modeling_qwen attention not use multi branch position ids & attention_mask

Generation of Draft tokens and Trie tree creation

论文里看到Table 8 Inference Latency with Lookahead for vLLM.

← Metadata

Owner

Metadata

PainlessInferenceAcceleration
PainlessInferenceAcceleration copied to clipboard