Add FastAPI server

Open Blaizzy opened this issue 11 months ago • 2 comments

You meant to run it text-only via mlx_vlm. I was looking to use it text-only with mlx_lm because of the OpenAI API-server.

https://github.com/ml-explore/mlx-examples/pull/1336#issuecomment-2718049702

Mar 12 '25 15:03 Blaizzy

I've done it here if helpful : https://github.com/pappitti/mlx-vlm/blob/main/mlx_vlm/server.py

I didn't aim for OpenAi compatibility though, just dynamic loading and unloading of models. Caching (one at a time) when the server is running to avoid reloading. Handles streaming. the /generate endpoint works well (with images at least). Other endpoints are WIP. /chat endpoint not tested and the /batch_processing endpoint is just a placeholder (it's a project on its own).

https://github.com/user-attachments/assets/46ba3a5f-d7b7-4cfa-ba8d-9898c4cdc2ab

Apr 10 '25 19:04 pappitti

This is awesome!

I have some ideas, let's collaborate on a PR

Apr 10 '25 20:04 Blaizzy