agents Prewarm the LLM before session starts

Currently (1.2) we prewarm the connection with STT and TTS, but not with the LLMs.

This is because LLMs usually require us to perform an inference request, instead of just opening a HTTP connection to /. There are a bunch of benefits to prewarming the LLM, the primary one is we could bypass the initial connection setup time, which could add up to be ~2s (DNS, SSL roundtrips).

Aug 23 '25 06:08 davidzhao

looking for this as well. i am having better results with gpt-5 but the latency is unbearable on the first turn. This becomes more immediate. i am seeing 4+ seconds for first calls vs 1.2 seconds for cached calls.

Aug 26 '25 18:08 galigutta

@davidzhao where do you prewarm STT and TTS in livekit agents. i guess its not implemented for any plugins. We are seeing very high first speech latency from agent after user message. Can you hep me how you prewarm STT and TTS.

Sep 01 '25 15:09 abhismatrix1

@davidzhao Is there a guide how to prewarm LLMs explicitly?

Sep 30 '25 09:09 ss14

Are there any updates on this?

Oct 01 '25 14:10 anlagbr

Hi @davidzhao,

I've been looking into LLM prewarming and wanted to share my findings.

You mentioned that LLMs typically require an inference request for prewarming, unlike STT/TTS which can just open an HTTP connection to /. However, I discovered that for OpenAI and other HTTP-based LLM APIs, we can actually achieve the same connection prewarming benefits without needing to send a full inference request.

For self-hosted LLMs, sending an inference request during prewarm makes sense to "wake up" the model and load it into memory. However, for public LLM services (OpenAI, Anthropic, Google, etc.), the models are already running and serving requests globally. In this case, the primary latency bottleneck is the client-side connection establishment (DNS, TCP, TLS), not the model availability.

I've implemented a prewarm() method for the OpenAI LLM plugin that:

Makes a lightweight GET request to / in a background task
Establishes the HTTP connection (DNS resolution, TCP handshake, TLS negotiation)

Test Results

If you believe this approach is correct, I'd be happy to submit a PR for your review! Let me know your thoughts!

Nov 04 '25 14:11 Pulkit0729

@Pulkit0729 could you share your code even if dirty?

Nov 07 '25 13:11 marctorsoc

@marctorsoc I have a added a pr where you can see the changes. The changes are only for open ai llm for now and can be added for others as well once the logic is aprooved

Nov 07 '25 13:11 Pulkit0729

thanks @Pulkit0729 , I took a look and left a comment. Looks good to me!

Nov 10 '25 16:11 marctorsoc

@Pulkit0729 this is for the openai plugin. How would this go if I use openai via livekit inference? is it worth pre-warming and stick to the plugin vs using livekit inference? (supposed to be more stable, not sure about latency)

@davidzhao @longcw is this in the roadmap / already implemented in 1.3?

Nov 24 '25 11:11 marctorsoc