text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Explain the generation parameters in the benchmarking utility

Open Blair-Johnson opened this issue 2 years ago • 8 comments

Feature request

Improved README.md for the benchmarking utility that explains the different command line arguments.

Motivation

The benchmarking tool is awesome, I would just like to have some additional information about the command line parameters. Specifically, sequence-length and decode-length. I think that I know what these parameters mean in the context of this project, but it would be helpful to have a description so that people can relate the benchmark performance numbers to real-world usage. Having more detailed descriptions of the parameters displayed by the benchmarking utility would also be helpful, specially the difference between prefill and decode performance.

Your contribution

I would be happy to help write some documentation if someone could provide more detailed descriptions of these parameters and how they relate to model inference.

Blair-Johnson avatar Jun 06 '23 18:06 Blair-Johnson

Adding real rustdoc to the clap args here: https://github.com/huggingface/text-generation-inference/blob/main/benchmark/src/main.rs#L16

Should be plenty enough documentation. It automatically documents the cli itself ( -h ), the rustdoc, and for the readme we could simply tell users to use -h for advanced usage.

Narsil avatar Jun 06 '23 21:06 Narsil

That would be a great start. It would also be nice to document the actual metrics that are captured in the output of the benchmark. What is actually being measured, etc.

Blair-Johnson avatar Jun 07 '23 15:06 Blair-Johnson

~~Accidental close~~

Blair-Johnson avatar Jun 07 '23 15:06 Blair-Johnson

Are you willing to open a PR for it ?

Narsil avatar Jun 08 '23 08:06 Narsil

Yes, I just need some explanation of the different measurements and parameters.

Blair-Johnson avatar Jun 09 '23 17:06 Blair-Johnson

Hi @Blair-Johnson Could you please explain sequence-length and decode-length for me ? I find they are confusing. Can I roughly consider as input length and output length ? Thanks !

Tracin avatar Jun 15 '23 11:06 Tracin

Sorry missed your reaction @Blair-Johnson . I created a PR with a first draft, feel free to ask questions so we can make those even clearer.

Narsil avatar Jun 15 '23 14:06 Narsil

@Narsil No worries, thank you for your work! I'll update your PR with some questions. @Tracin Take a look at the files changed in the PR for Narsil's descriptions of those parameters, I think your description is probably accurate, but I have a few followup questions that I'll ask there.

Blair-Johnson avatar Jun 15 '23 15:06 Blair-Johnson