llm icon indicating copy to clipboard operation
llm copied to clipboard

Better generation stats

Open LLukas22 opened this issue 2 years ago • 4 comments

I'm currently facing an issue where the generation on a gpu sometimes slows down and its very hard to determine why. (see https://github.com/rustformers/llm/pull/325)

It would be great if we could have an option to get more detailed information from the generation process. Maybe we could divide the per token times into the following categories:

  • Forward pass: Raw time spend in the evaluate function of the model
  • Sampler: Time spend sampling the tokens
  • Decoding: Time taken by the tokenizer to decode the tokens
  • Printing: Time spend invoking the callback and printing to the CLI

LLukas22 avatar Jun 25 '23 15:06 LLukas22

It would also be helpful to see the max and min time of each category, alongside the mean

jafioti avatar Jun 25 '23 17:06 jafioti

Sounds good to me, would anyone be interested in doing this?

philpax avatar Jun 25 '23 20:06 philpax

I could give it a try but im still kinda bussy with the CUDA/OpenCL stuff and i have no idea how i would implement performance metrics and loggin correctly in rust 😬

LLukas22 avatar Jun 26 '23 11:06 LLukas22

You can probably just use std::time::Instant - it should be precise enough for this application. Just create some Instants at each measurement point, then call .elapsed() on them to find the amount of time that has passed since that instant.

philpax avatar Jun 26 '23 23:06 philpax