agents
agents copied to clipboard
How to measure latency properly?
Feature Type
Nice to have
Feature Description
I'd like to improve the docs for measuring latency here and for my own sake.
In theory the formula is:
total_latency = eou.end_of_utterance_delay + llm.ttft + tts.ttfb
but from my measurements it doesn't not predict the amount of silence from the person talking to the agent responding.
I downloaded a recording from a conversation and measured in two types of turns: when calling a tool and when generating a response.
Notes about user response makes agent call a tool
This is a bit of a special case, but I have many tools that have scripted messages parametrized with the tool params so a parametrized session.say. This means that the first token from the LLM cannot be sent to the TTS since we have to finish the tool call text to call the tool, then to send the message in the say to TTS. I'm happy to be wrong, but that's how I understand this 😅
I also read somewhere that there's a SentenceTokenizer and the first token is not sent to the TTS but they are sent by chunks. So not sure if the formula above reflects this, and if it's possible to derive a closed-form expression given this mechanism.
Measurements
Calling a tool
- total measured audacity = 3.3 secs
- EOU delay: langfuse=1.347
- LLM TTFT: langfuse=2.36
- TTS TTFB: langfuse=0.28
- total using formula = 3.98
Generate a response
- total measured audacity = 2.6 secs
- EOU delay: langfuse=1.00
- LLM TTFT: langfuse=1.07
- TTS TTFB: langfuse=0.25
- total using formula = 2.33
Am I doing something wrong? Could you give me some hints on how to measure effectively? I want to understand what are the bottlenecks for the agent not responding quicker. My understanding is that from the user talking to the agent talking (that's what I call total measured audacity) should be the same as the total using formula.
Thanks in advance for any contribution / observations :)
Workarounds / Alternatives
No response
Additional Context
No response