exo Support local model with inference-engine mlx

Support local model with inference-engine mlx

Open OKHand-Zy opened this issue 3 months ago • 2 comments

Enhancement: support local and custom models #165

This is a modified version of my existing code, although the code quality may not be ideal. It supports running local path models using mlx for both CLI and ChatAPI. (For now, you still need to manually place the model into the ~/.cache/exo directory before use. I'm working on automating this, but my unfamiliarity with grpc requires further research.)

However, I encountered an issue during testing: the CLI response often freezes. My hypothesis is that network latency on the host side, due to multiple network hops, is causing problems when sending data chunks.

I would greatly appreciate any suggestions on how to improve or optimize the code for better results.

Changes:

Added a "How to use local models" section to the README.
Implemented init_exo_env to configure local model cards and the local model store.
Added bypass logic for local models using if...else statements.

Nov 20 '24 06:11 OKHand-Zy

exo exo copied to clipboard

Support local model with inference-engine mlx

Changes:

exo
exo copied to clipboard