Add FastAPI server
You meant to run it text-only via mlx_vlm. I was looking to use it text-only with mlx_lm because of the OpenAI API-server.
https://github.com/ml-explore/mlx-examples/pull/1336#issuecomment-2718049702
I've done it here if helpful : https://github.com/pappitti/mlx-vlm/blob/main/mlx_vlm/server.py
I didn't aim for OpenAi compatibility though, just dynamic loading and unloading of models. Caching (one at a time) when the server is running to avoid reloading. Handles streaming. the /generate endpoint works well (with images at least). Other endpoints are WIP. /chat endpoint not tested and the /batch_processing endpoint is just a placeholder (it's a project on its own).
https://github.com/user-attachments/assets/46ba3a5f-d7b7-4cfa-ba8d-9898c4cdc2ab
This is awesome!
I have some ideas, let's collaborate on a PR