llama_index
llama_index copied to clipboard
[Feature Request]: Openai_like embedding model integration
Feature Description
Openai_like LLMs are supported, however, it would be useful to do this with embedding models. This would allow users to integrate custom frameworks more easily. Namely, deepspeed fastgen model inference or vLLM, as currently I can't see an easy way to integrate local (or hosted) pipeline parallelised embedding models.
Reason
There is no way to integrate a custom API endpoint with embedding models and only recognised frameworks are supported
Value of Feature
This will increase compatibility and enable embedding models to be parallelised across GPUs with frameworks such as DeepSpeed-mii and vLLM (I assume).