ggml
ggml copied to clipboard
GPT Benchmarks
GPT models without KV cache have to recalculate values and thus time to compute grows exponentially given a longer input.
Thus, for your benchmarks, how many tokens were generated, and with how many total? Does this support a caching system?