ggml icon indicating copy to clipboard operation
ggml copied to clipboard

GPT Benchmarks

Open mallorbc opened this issue 1 year ago • 2 comments

GPT models without KV cache have to recalculate values and thus time to compute grows exponentially given a longer input.

Thus, for your benchmarks, how many tokens were generated, and with how many total? Does this support a caching system?

mallorbc avatar Oct 13 '22 04:10 mallorbc