mlx-vlm
mlx-vlm copied to clipboard
[WIP] Reduce deps core
Summary:
This PR removes the dependency on torch, torchvision, and transformers by porting the necessary processors directly into mlx-vlm. It also restructures pyproject.toml to support optional installations.
Changes:
- Removed Dependencies: Core installation no longer requires Torch or Transformers.
- New Extras: Added optional flags for
[trainer],[server], and[audio]. - Refactoring:
- Replaced
mlx-audiowithsoundfile. - Moved audio imports to be lazy-loaded within functions to avoid crashes for users without audio dependencies.
- Cleaned up redundant imports in
utils.py.
- Replaced
- Docs: Added installation instructions for optional dependencies to the README.
Sort of related, have you considered replacing py-opencv which pulls in a rather hefty set of deps (120+)? It looks like it's currently only used to load and resize the frames of videos.