Call Remote LLM API Instead Of Inferrencing Locally

Open thiner opened this issue 1 year ago • 0 comments

Currently, Byaldi requires local inference by RAG = RAGMultiModalModel.from_pretrained("vidore/colpali-v1.2"). Can you refactor the RAGMultiModalModel class to call a remote VLLM API to complete the work? E.g. RAGMultiModalModel.from_api("https://localhost:3000/v1/completions").

Oct 09 '24 09:10 thiner