litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Feature]: Improve import speed

Open lsorber opened this issue 10 months ago • 27 comments

The Feature

Simply importing LiteLLM can take up to 1 second, which is rather slow. Could the import speed be improved?

Minimal reproducible example:

uv run --python 3.10 --with litellm python -X importtime -c "import litellm"

Output:

import time: self [us] | cumulative | imported package
...
import time:    353694 |    1098960 | litellm

From the logs, you can see that litellm.utils already takes 250ms to load for example.

Motivation, pitch

If a project has 3 such dependencies, it would take 3 seconds to import that project.

Are you a ML Ops Team?

No

Twitter / LinkedIn details

@LaurentSorber

lsorber avatar Jan 07 '25 14:01 lsorber

For me, it's super slow as well (even slower then OP):

uv run --python 3.10 --with litellm python -X importtime -c "import litellm"

import time: 20107192 | 20857950 | litellm

yachty66 avatar Jan 09 '25 05:01 yachty66

Hey @lsorber @yachty66 i tried looking into this but didn't have a good way to trace the issue - if you / anyone else can help us identify the bottleneck, we'd welcome a PR / some help here!

krrishdholakia avatar Jan 10 '25 16:01 krrishdholakia

Hey Krish. The problem from my side was actually the network I was in and not the library itself, so all is okay from my side. Regards, litellm @krrishdholakia .

yachty66 avatar Jan 11 '25 20:01 yachty66

@krrishdholakia I don’t have the bandwidth to contribute a fix unfortunately. But the command I shared in the OP outputs a full trace of all imports and so should be a good starting point to analyze what’s causing it to be slow.

lsorber avatar Jan 11 '25 20:01 lsorber

cc: @ishaan-jaff in case this is helpful in your perf work

krrishdholakia avatar Jan 11 '25 21:01 krrishdholakia

This would be an amazing upgrade, it should be extremely fast to import litellm

tcapelle avatar Feb 17 '25 14:02 tcapelle

@tcapelle if you want to contribute this fix - that would be helpful.

I wasn't able to make much headway in my investigation.

krrishdholakia avatar Feb 17 '25 14:02 krrishdholakia

I don't have much time at the moment but o3-mini-high thinks this about the output of the uv ...

The timing output shows that importing litellm takes nearly a second—which is a lot for a module import. A closer look at the trace reveals that litellm pulls in a very large dependency graph. In particular, heavy libraries such as OpenAI’s SDK, Pydantic, tokenizers (tiktoken, regex, etc.), and even parts of HTTP libraries are all being loaded up front.

Some key points: • Deep Dependency Tree: Litellm imports many submodules and third‐party libraries. For example, the OpenAI library and Pydantic are known to perform lots of work at import time. • Eager Imports: All these modules and submodules are imported immediately on startup—even if not all of them are used in every run. This “eager” import strategy adds up. • Initialization Overhead: Some of these libraries do nontrivial initialization (e.g. caching, compiling regex patterns, reading metadata) which further delays the import.

Possible Remedies: • Lazy Imports: Consider restructuring the package so that some of the heavy imports occur only when needed (i.e. inside functions rather than at module level). • Modularization: Split the functionality into smaller submodules so that users only import the parts they need. • Optimize Dependencies: Evaluate if all dependencies are necessary on startup, or if some can be delayed or replaced with lighter alternatives.

In summary, the slowness is less an issue with litellm itself and more a side effect of a deep and heavy dependency graph. Addressing it will likely require a combination of lazy-loading and refactoring the package’s import structure.

tcapelle avatar Feb 17 '25 15:02 tcapelle

The 1000 lines init.py is probably the culprit...

tcapelle avatar Feb 17 '25 15:02 tcapelle

Here's an issue from huggingface diffusers where they are dealing with the same problem: https://github.com/huggingface/diffusers/issues/4260 -- they are copying the lazy load patterns from hf's transformers package. Ya'll might be able to adopt the same kind of strat as these packages

fmmoret avatar Feb 21 '25 21:02 fmmoret

And here's that repo's PR for their first sets of changes for that effort: https://github.com/huggingface/diffusers/pull/4829 10x speedup & there was other low hanging fruit for them still after that pr

fmmoret avatar Feb 21 '25 21:02 fmmoret

Indeed it feels very heavy for a 'lite' lib;

import time: 594194 | 2800485 | litellm

willbelr avatar Feb 23 '25 14:02 willbelr

Thanks @fmmoret, this approach looks promising - https://github.com/huggingface/diffusers/pull/4829/files#diff-b828b278a068d46526a52d0facbdff41298fc776db6b6a5a8c04cd264ea3043f

krrishdholakia avatar Feb 23 '25 14:02 krrishdholakia

Added to march 2025 roadmap

ishaan-jaff avatar Mar 07 '25 21:03 ishaan-jaff

Just want to point out that https://github.com/BerriAI/litellm/issues/2677 is related and contains some more info possibly of interest to anyone tackling this. 🙏

gimbo avatar Mar 21 '25 12:03 gimbo

pyinstrument is pretty helpful here, much more helpful to grok what's going on than importtime IMO

pyinstrument -r html -o profile.html -c 'import litellm'

Which gives this

Image

Thoughts:

  • the network request i a real wild card, since that likely varies so much machine to machine. My guess is that the worst case scenarios are usually related to this. If that could be made lazy, that seems like a big gain
  • I was surprised at the cost of loading models, and how much of the time is just loading openai. Really was expecting the one network request for the huge json blob to be more of the culprit. I think maybe my internet is pretty good? This one's going to be hard to workaround. Something like #9791 will probably be needed for that, but looking through that change, its quite an additional complexity. Also, in testing that change, it's quite fast (~200ms which probably has other overhead), but as soon as you access e.g. completion, its back to the same speed. So you'd need to be very careful how you import from litellm to see actual speed gains
  • the tokenizers look like a quick win I addressed in #9874

adrianlyjak avatar Apr 10 '25 03:04 adrianlyjak

Looks like biggest easy wins are:

  • use backup json instead of network request (maybe move this to a background job?)
  • merge in tokenizers PR (commented on there)

Any suggestions for the loading of pydantic models?

krrishdholakia avatar May 25 '25 00:05 krrishdholakia

merge in tokenizers PR (commented on there)

I'll this one caught back up

use backup json instead of network request (maybe move this to a background job?)

As a short term loading time fix, that would be the simplest path forward. It's maybe not the best compromise. Its a nice feature that the models and costs are always kept up to date, despite the current library version.

IIRC, I went somewhat down the path of investigating how to lazy load these, but there are a lot of things in main.py and related need this data on demand, so it ends up being a pretty big change, and requires a lot of the model configuration instantiation to be deferred.

Any suggestions for the loading of pydantic models?

Similarly, I think this would require significant changes to main.py and __init__.py to cleanly do a lazy loading

adrianlyjak avatar May 25 '25 01:05 adrianlyjak

Ways to improve:

  • Ban single use imports (move to inside the functions)
  • import pydantic modules within functions (no top level imports - use TYPE_CHECKING for linting)
  • create http clients on first use, not on init

These 3 changes should probably deliver the bulk of value

krrishdholakia avatar May 31 '25 15:05 krrishdholakia

Same issue here. One can quickly profile it by running: python -X importtime your_code_that_imports_litellm.py

Importing LiteLLM requires a lot of time and the larger it gets (the more it supports), the more time it will (probably) require.

Lazy loading would help a lot, i.e. importing modules only before they are needed. Maybe also https://github.com/mnmelo/lazy_import is helpful, but I have no personal experience with it. I usually resort to the first trick, import late, right before usage in the function.

vlerenc avatar Jun 28 '25 15:06 vlerenc

okay - PR 1 cuts 2 seconds from import time https://github.com/BerriAI/litellm/pull/12135

ishaan-jaff avatar Jun 28 '25 17:06 ishaan-jaff

PR 2 - cuts 0.35s https://github.com/BerriAI/litellm/pull/12140

ishaan-jaff avatar Jun 28 '25 21:06 ishaan-jaff

The improvements here are great, but I am still seeing ~2s import time the first time I access the litellm module.

In my case, I am using litellm.router.Router and integrating with Langfuse. To enable the callbacks for Langfuse, I need to import litellm. It would be nice if there were some way to isolate Router and allow callbacks to be defined directly, rather than needing to import litellm.

Just some food for thought -- thanks for your continued work here @ishaan-jaff!

whitfin avatar Jul 08 '25 18:07 whitfin

I'm currently clocking import litellm at ~~7.65~~ 5.5 seconds (updated to latest) which is a lot. That seems significantly greater than the times mentioned here. This is on an M1 mac studio so not exactly a small machine.

Also running into issues around Router callbacks - everything goes back to global which also means it's difficult to isolate i fyou have multiple Router instances.

chrisgoddard avatar Aug 28 '25 15:08 chrisgoddard

Any performance engineers on this thread interested in helping us improve LiteLLM Perf ? We're hiring a full time perf engineer

If yes, please grab some time here

ishaan-jaff avatar Sep 02 '25 00:09 ishaan-jaff

I'm encountering the same issue. Import time seems to vary a lot, anywhere between 3 seconds to 20+ seconds (usually around 4). I've tried installing it on an NVME SSD, I'm wired to a fiber internet connection with low latency, and I still get pretty bad import times.

Here's my test output:

Library - Import Time
json - 7.56 ms
typer - 61.25 ms
numpy - 56.44 ms
litellm - 3.226 s

My application needs to be fast so I might need to ditch LiteLLM if I can't solve this issue, which is a shame because I really like what this library offers. Hopefully we find a solution!

gael-vanderlee avatar Oct 31 '25 21:10 gael-vanderlee

Just wanted to chime in to say this would be quite important for me as well. I've also tried what was suggested in #16177 but with not much luck: the import time remains around 2 seconds, which is about 50% of my app’s startup time, and feels unnecessary for a single module.

liukidar avatar Nov 17 '25 13:11 liukidar