LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

Using web-hosted model for inference

Open dnnp2011 opened this issue 6 months ago • 13 comments

Currently the NousResearch/Llama-2-7b-chat-hf model appears to be running locally on my machine, which can take quite a while for long prompts. I'd like to use more AI-optimized hardware to speed this process up.

Is it possible to use a web-hosted version of the model, or use a different web-hosted model entirely?

dnnp2011 avatar Jan 05 '24 14:01 dnnp2011