gpt-fast
gpt-fast copied to clipboard
What's the input context length for the benchmark results?
With longer input length, the prefill phase latency would be higher, could you share the model input token count when obtaining the results in this post?
https://pytorch.org/blog/accelerating-generative-ai-2/?utm_content=273712248&utm_medium=social&utm_source=twitter&hss_channel=tw-776585502606721024
Low, maybe 5 tokens?
I see, it would be nice to benchmark on larger context length as the first token latency can increase significantly.