Alex Cheema
Alex Cheema
Watching this. I think some users have ran into this issue on exo e.g. https://github.com/exo-explore/exo/issues/152
@blindcrone you're running this on your linux box right? Could you take a look at what might be the issue here? Thanks!
You can force it to use MLX by running exo like this: `exo --inference-engine mlx` Does that work for you? Still need to fix detection of Apple Silicon ofc but...
Can you please run this @smokk89 `python -c "import sys; print('Platform:', sys.platform); import platform; print('Machine:', platform.machine())"`
> @AlexCheema Thank you just ran it. Really weird how I am seeing X86 as machine on my M1 Mac but M3 is showing Arm as machine. Any reason why...
Closing as above suggestion should fix this. Please re-open if you still have this issue.
Why is this a "bad" output? tinygrad and MLX are using slightly different models. It's one of the magical things about exo: different models are interoperable.
> in the tinygrad screenshot it hasn't answered what I've asked in the second prompt at all. Try having a conversation with 1B using MLX and then tinygrad, I'm just...
> @AlexCheema Yea, this looks like a context bug to me, and makes an argument for spending some time reconciling the different caching methods between these implementations, and fully utilizing...
This is awesome! Much needed addition. I'm going to assign a $500 retrospective bounty for this if we can get a Minimal Tokenizer implementation working for all models without any...