mlx_vlm.server loads CPU on Macbook pro m1 pro to 60-75% + /health doesn't work
Why does the Python process load the CPU to 60-70% (in activity monitor) on my MacBook Pro M1 Pro immediately after I start mlx_vlm.server, and continue to do so constantly until control+C? As I understand it, it's not that much, as it can be up to 800% due to multi-cores. However, my Mac gets a little bit hot.
Plus, while /generate or /unload endpoints work correctly, /health returns {"detail":"Method Not Allowed"}
I use the latest version, I suppose (pip install -U mlx-vlm mlx) and Python 3.13.3
After control+C:
pip % /opt/homebrew/Cellar/[email protected]/3.13.3_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/multiprocessing/resource_tracker.py:301: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown: {'/mp-ibtcejdr'} warnings.warn(
Could you try with the lastest and let me know if the issue persists?