llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

[Feature Request]: Openai_like embedding model integration

Open regybean opened this issue 8 months ago • 2 comments

Feature Description

Openai_like LLMs are supported, however, it would be useful to do this with embedding models. This would allow users to integrate custom frameworks more easily. Namely, deepspeed fastgen model inference or vLLM, as currently I can't see an easy way to integrate local (or hosted) pipeline parallelised embedding models.

Reason

There is no way to integrate a custom API endpoint with embedding models and only recognised frameworks are supported

Value of Feature

This will increase compatibility and enable embedding models to be parallelised across GPUs with frameworks such as DeepSpeed-mii and vLLM (I assume).

regybean avatar Jun 24 '24 08:06 regybean