LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

Support for remote LLM through API

Open deltawi opened this issue 1 year ago • 4 comments

Hi team,

Due to computing resources needed to run this, it would be nice if you can also add an option where user can give url_endpoint and api_key for a remote REST API to use for this instead of downloading the model from HuggingFace.

deltawi avatar Jan 18 '24 10:01 deltawi

Hi @deltawi, thank you for your interest in and support of LLMLingua.

Currently, since API models do not provide log probabilities for the prompt end, it's challenging to directly support related requirements. However, we will incorporate this need into our future plans.

Refer to issue #44.

iofu728 avatar Jan 18 '24 12:01 iofu728

Hi @deltawi, thank you for your interest in and support of LLMLingua.

Currently, since API models do not provide log probabilities for the prompt end, it's challenging to directly support related requirements. However, we will incorporate this need into our future plans.

Refer to issue #44.

hi @iofu728 , is this possible to run a model on a server and point the code to use that model over its API? e.g. run a llama2 7b on a server.

bytecod3r avatar Feb 10 '24 13:02 bytecod3r

Same need here. I love the concepts of LLMLingua and they are super useful for users, however, I do not have the ability to self-host inference for any model (due to many different reasons: cost, know-how, security, capacity, etc.). I leverage Microsoft Azure AI and Fireworks AI and they have models that can apparently be used (small and fast) for LLMLingua. I'd like to have the ability to use an API for the calls that LLMLingua needs.

Any comments on whether this will make it into the roadmap?

afbarbaro avatar May 28 '24 15:05 afbarbaro

Same need here. I love the concepts of LLMLingua and they are super useful for users, however, I do not have the ability to self-host inference for any model (due to many different reasons: cost, know-how, security, capacity, etc.). I leverage Microsoft Azure AI and Fireworks AI and they have models that can apparently be used (small and fast) for LLMLingua. I'd like to have the ability to use an API for the calls that LLMLingua needs.

Any comments on whether this will make it into the roadmap?

Hi @afbarbaro, we support the API mode in Prompt flow, you can refer this document to use it.

iofu728 avatar May 30 '24 09:05 iofu728