llm-foundry
llm-foundry copied to clipboard
Inference Speed Benchmark
❓ Question
Hi, I am looking for the matric to compare the inference speed of the 7B, 13B and 70B models. More precisely I am looking for something like "X Tokens/sec" on "M x N Type GPUs".