llm-analysis icon indicating copy to clipboard operation
llm-analysis copied to clipboard

latency [BUG]

Open Akash08naik opened this issue 1 year ago • 3 comments
trafficstars

The latency i am getting here and the actual time when i am inferencing are not same. And also there is a huge difference between these two. So could be the problem?

Akash08naik avatar Jan 03 '24 10:01 Akash08naik

Can you share the way you run the tool and the actual time you saw in your benchmarking?

cli99 avatar Jan 03 '24 20:01 cli99

I used the tool by a running slurm job . Whereas the actual time I observed was loading the model and timing it using time module when given a prompt till decoding it . And all this is done on a cpu not gpu.

Akash08naik avatar Jan 03 '24 22:01 Akash08naik

what happened the tool inference time is not matching with the inference i am taking in real time. I have ran it on v100 gpu card of 16 gb.

Akash08naik avatar Jan 08 '24 07:01 Akash08naik