text-generation-inference
text-generation-inference copied to clipboard
Explain the generation parameters in the benchmarking utility
Feature request
Improved README.md for the benchmarking utility that explains the different command line arguments.
Motivation
The benchmarking tool is awesome, I would just like to have some additional information about the command line parameters. Specifically, sequence-length and decode-length. I think that I know what these parameters mean in the context of this project, but it would be helpful to have a description so that people can relate the benchmark performance numbers to real-world usage. Having more detailed descriptions of the parameters displayed by the benchmarking utility would also be helpful, specially the difference between prefill and decode performance.
Your contribution
I would be happy to help write some documentation if someone could provide more detailed descriptions of these parameters and how they relate to model inference.
Adding real rustdoc to the clap args here: https://github.com/huggingface/text-generation-inference/blob/main/benchmark/src/main.rs#L16
Should be plenty enough documentation. It automatically documents the cli itself ( -h ), the rustdoc, and for the readme we could simply tell users to use -h for advanced usage.
That would be a great start. It would also be nice to document the actual metrics that are captured in the output of the benchmark. What is actually being measured, etc.
~~Accidental close~~
Are you willing to open a PR for it ?
Yes, I just need some explanation of the different measurements and parameters.
Hi @Blair-Johnson Could you please explain sequence-length and decode-length for me ? I find they are confusing. Can I roughly consider as input length and output length ?
Thanks !
Sorry missed your reaction @Blair-Johnson . I created a PR with a first draft, feel free to ask questions so we can make those even clearer.
@Narsil No worries, thank you for your work! I'll update your PR with some questions. @Tracin Take a look at the files changed in the PR for Narsil's descriptions of those parameters, I think your description is probably accurate, but I have a few followup questions that I'll ask there.