PainlessInferenceAcceleration icon indicating copy to clipboard operation
PainlessInferenceAcceleration copied to clipboard

Results 15 PainlessInferenceAcceleration issues
Sort by recently updated
recently updated
newest added

请问:batch_indices 是什么含义? LookaheadCache 的put、stream_put函数的最后一个参数idx是什么含义? mode和 idx 直接有关系吗? 为啥mode='input', idx = 0,1,2,3.. batch的值,为啥mode='output', idx=-1 ? https://github.com/alipay/PainlessInferenceAcceleration/blob/8015f12f7fe32acc102bb3eb51c4f8b3a420e79c/pia/lookahead/common/pretrained_model_batch.py#L1254-L1259 ```python def put(self, token_ids, branch_length=8, final=False, mode='output', idx=0): ``` 为什么 `idx is only used for...

When I manually set StoppingCriteria, Lookahead can only generate one token.

I reviewed the code of modeling_qwen.py, and I noticed that, within the lookahead process, the draft_ids matched from the TrieTree are such that the attention_mask and position ids associated with...

Hi, can you please share the information about generation of draft tokens, which method are you using. And after this how are you utilizing for creating Trie tree. It will...

请问vLLM+lookahead这部分代码有吗?还是要改这个代码,如果需要改,怎么改呢,才能结合vllm