Alex Cheema comments

Results 417 comments of


                                            Alex Cheema

autocast bfloat16 to float32 for llama on clang

Watching this. I think some users have ran into this issue on exo e.g. https://github.com/exo-explore/exo/issues/152

clang issues on ubuntu 24.04 and Python 3.12

@blindcrone you're running this on your linux box right? Could you take a look at what might be the issue here? Thanks!

Not detecting M1 Mac. Possible Bug

You can force it to use MLX by running exo like this: `exo --inference-engine mlx` Does that work for you? Still need to fix detection of Apple Silicon ofc but...

Not detecting M1 Mac. Possible Bug

Can you please run this @smokk89 `python -c "import sys; print('Platform:', sys.platform); import platform; print('Machine:', platform.machine())"`

Not detecting M1 Mac. Possible Bug

> @AlexCheema Thank you just ran it. Really weird how I am seeing X86 as machine on my M1 Mac but M3 is showing Arm as machine. Any reason why...

Not detecting M1 Mac. Possible Bug

Closing as above suggestion should fix this. Please re-open if you still have this issue.

broken tinygrad responses

Why is this a "bad" output? tinygrad and MLX are using slightly different models. It's one of the magical things about exo: different models are interoperable.

broken tinygrad responses

> in the tinygrad screenshot it hasn't answered what I've asked in the second prompt at all. Try having a conversation with 1B using MLX and then tinygrad, I'm just...

broken tinygrad responses

> @AlexCheema Yea, this looks like a context bug to me, and makes an argument for spending some time reconciling the different caching methods between these implementations, and fully utilizing...

[WIP] Minimal Tokenizer Implementation

This is awesome! Much needed addition. I'm going to assign a $500 retrospective bounty for this if we can get a Minimal Tokenizer implementation working for all models without any...