Implement instructor for MLX support to interact with LLM on Apple platforms (M1/M2/M3)
Is your feature request related to a problem? Please describe.
I'm interested in running LLMs locally on Apple Silicon (M1/M2/M3) using Instructor, but currently the library only supports OpenAI and compatible APIs. There is no native support for Apple's MLX framework, which is optimized for these devices. As a result, it's not possible to fully leverage the privacy, speed, and cost benefits of running LLMs directly on Mac hardware using Instructor.
Describe the solution you'd like
I'd like to see Instructor support MLX as a backend for model inference. This could be implemented as a new client or adapter, allowing users to pass prompts and receive structured outputs from locally hosted LLMs (such as Llama, Mistral, or Phi models running via MLX) in the same way they would with OpenAI. Ideally, the API would remain consistent, just swapping the backend.
Describe alternatives you've considered
I've considered using other frameworks or creating custom wrappers for MLX, but none offer the seamless, schema-driven and robust structured output experience Instructor provides. Other projects like Toolio are exploring MLX agents, but they don't have the same Pythonic interface or validation features.
Additional context
- Apple MLX repo: https://github.com/apple/mlx
- Example of LLM inference with MLX: https://github.com/ml-explore/mlx-examples
- This would make Instructor even more useful for privacy-conscious and offline-first applications, especially for Mac users.
- If needed, I'm happy to help test or provide feedback on this feature!
Is it possible to run something like this via ollama or Llama Cpp?
Is it possible to run something like this via ollama or Llama Cpp?
Is possible the problem is the arch, apple plantaforn is another arch, the pc go crazy the CPU, the memory, because don't interact well with the llm
@jxnl I can work in this feature to implemente MLX for Apple devices use better instructor
Am also interested in this:
- Ollama uses llama.cpp - which uses Apple Metal
- MLX from Apple uses CoreML - which uses a combination of the Apple Neural Engine and the GPU
For some models, MLX can be faster - and it would be great to use a single tool like instructor to run apples-to-apples (pun intended 🍎 ) comparisons.
can do the integration with https://github.com/ml-explore/mlx-lm - which may be easier to use.