litellm
litellm copied to clipboard
[Feature]: reuse vertex_ai client
The Feature
Vertex AI seem to have an overhead (probably auth related?) and hence needs reusing the client. for faster response.
Below is the experiment I did regenerating with fresh client (like litellm) and reusing the model (and client internally) on langchain.
Note that the time is higher in litellm since the input is not exactly same (difference in role probably?) but the mean and std deviation are consistent.
from langchain_community.chat_models.vertexai import ChatVertexAI
from litellm import completion
with litellm
%%timeit -n 1 -r 10
completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])
4.64 s ± 1.05 s per loop (mean ± std. dev. of 10 runs, 1 loop each)
with langchain (fresh client)
%%timeit -n 1 -r 10
ChatVertexAI(model_name="gemini-pro").invoke("write code for saying hi from LiteLLM")
2.44 s ± 102 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
with langchain (reuse model)
model = ChatVertexAI(model_name="gemini-pro")
%%timeit -n 1 -r 1
model.invoke("write code for saying hi from LiteLLM")
2.46 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
%%timeit -n 1 -r 10
model.invoke("write code for saying hi from LiteLLM")
1.41 s ± 311 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)
Motivation, pitch
Vertex AI seem to have an overhead (probably auth related?) and hence needs reusing the client. for faster response.
Twitter / LinkedIn details
https://www.linkedin.com/in/sufiyanadhikari/