PainlessInferenceAcceleration In the benchmark studies, how are the draft tokens generated?

In the benchmark studies, how are the draft tokens generated?

Open jivanph opened this issue 1 year ago • 9 comments

I read with great interest your paper 'Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy'.

In essence, the paper proposes a tree data structure to verify proposed draft tokens, and in this way speed up inference.

Unfortunately, it's not clear to me from the paper how these draft tokens were generated when establishing benchmark results for LookAhead-Parallel and LookAhead-Hierarchical.

I understand the focus on the paper is on how to handle a set of draft tokens (perhaps as a single branch, perhaps in parallel, or perhaps in a hierarchical manner). But the origin of the draft tokens in the benchmark results remains unclear to me.

Jan 24 '24 10:01 jivanph

PainlessInferenceAcceleration PainlessInferenceAcceleration copied to clipboard

In the benchmark studies, how are the draft tokens generated?

PainlessInferenceAcceleration
PainlessInferenceAcceleration copied to clipboard