LLMLingua
LLMLingua copied to clipboard
Using web-hosted model for inference
Currently the NousResearch/Llama-2-7b-chat-hf
model appears to be running locally on my machine, which can take quite a while for long prompts. I'd like to use more AI-optimized hardware to speed this process up.
Is it possible to use a web-hosted version of the model, or use a different web-hosted model entirely?