gpt-fast What's the input context length for the benchmark results?

What's the input context length for the benchmark results?

Open YangZhou0417 opened this issue 1 year ago • 1 comments

With longer input length, the prefill phase latency would be higher, could you share the model input token count when obtaining the results in this post?

https://pytorch.org/blog/accelerating-generative-ai-2/?utm_content=273712248&utm_medium=social&utm_source=twitter&hss_channel=tw-776585502606721024

Dec 04 '23 08:12 YangZhou0417

Low, maybe 5 tokens?

Dec 04 '23 22:12 Chillee

I see, it would be nice to benchmark on larger context length as the first token latency can increase significantly.

Dec 31 '23 14:12 YangZhou0417

gpt-fast gpt-fast copied to clipboard

What's the input context length for the benchmark results?

gpt-fast
gpt-fast copied to clipboard