moondream
moondream copied to clipboard
API Endpoint Recommendation
Would you have a recommendation on how to most easily set up an API endpoint that can dynamically batch requests (e.g. like vLLM)?
I realise this is probably quite involved, but perhaps you have some suggestions on quickest paths to hack a working solution.
I too am wondering this and have started looking into making a handler.py for deployment using hugging face inference endpoints
Just created a pull request to add support to vLLM: https://github.com/vllm-project/vllm/pull/4228
That’s great, thanks
On Sun 21 Apr 2024 at 00:55, vik @.***> wrote:
Just created a pull request to add support to vLLM: vllm-project/vllm#4228 https://github.com/vllm-project/vllm/pull/4228
— Reply to this email directly, view it on GitHub https://github.com/vikhyat/moondream/issues/87#issuecomment-2067817180, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVG6CWIJPZIZHN3B24IPOLY6L56XAVCNFSM6AAAAABGLLZGOGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXHAYTOMJYGA . You are receiving this because you authored the thread.Message ID: @.***>