litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Feature]: reuse vertex_ai client

Open dumbPy opened this issue 1 year ago • 4 comments

The Feature

Vertex AI seem to have an overhead (probably auth related?) and hence needs reusing the client. for faster response.

Below is the experiment I did regenerating with fresh client (like litellm) and reusing the model (and client internally) on langchain.

Note that the time is higher in litellm since the input is not exactly same (difference in role probably?) but the mean and std deviation are consistent.

from langchain_community.chat_models.vertexai import ChatVertexAI
from litellm import completion

with litellm

%%timeit -n 1 -r 10
completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])

4.64 s ± 1.05 s per loop (mean ± std. dev. of 10 runs, 1 loop each)

with langchain (fresh client)

%%timeit -n 1 -r 10
ChatVertexAI(model_name="gemini-pro").invoke("write code for saying hi from LiteLLM")

2.44 s ± 102 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)

with langchain (reuse model)

model = ChatVertexAI(model_name="gemini-pro")
%%timeit -n 1 -r 1
model.invoke("write code for saying hi from LiteLLM")

2.46 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

%%timeit -n 1 -r 10
model.invoke("write code for saying hi from LiteLLM")

1.41 s ± 311 ms per loop (mean ± std. dev. of 10 runs, 1 loop each)

Motivation, pitch

Vertex AI seem to have an overhead (probably auth related?) and hence needs reusing the client. for faster response.

Twitter / LinkedIn details

https://www.linkedin.com/in/sufiyanadhikari/

dumbPy avatar Feb 12 '24 14:02 dumbPy