Alex Cheema comments

Results 404 comments of


                                            Alex Cheema

Issue with loading weights

Can you try running with `SUPPORT_BF16=0`

Issue with loading weights

> Thanks for the reply! I assume that `SUPPORT_BF16=0` means smaller quantized weights accuracy, right? I gave it a shot, but it doesn't work. Maybe the problem stems from the...

the exo reasoning results are messy

Thanks for reporting this. Cool you have 5 Linux boxes set up. Can you run on one of the nodes with `DEBUG=6` and paste the logs here? This is definitely...

the exo reasoning results are messy

Does this work when you run on 2 of the CLANG boxes only? I can't reproduce on my end, wondering if it's some edge case when running with more nodes...

Multiple nodes do not speed up inference on large models

Currently exo does pipeline parallel inference which is faster compared to offloading when a single device can't fit the entire model. If a single device can fit the entire model,...

Mali GPU OpenCL does not support bfloat16

Try running with `SUPPORT_BF16=0` e.g. `SUPPORT_BF16=0 python3 main.py`. Can you let me know if that works? Ideally we detect this automatically.

An error module occurs when running llama3_distributed.py

The example was outdated. I've removed it now and added another example: https://github.com/exo-explore/exo/commit/5a9f4ba5c1f4b1258b91350714d6e68b6195eb8a I see you're using conda, in which case you may need to follow the conda instructions here...

[BOUNTY - $500] JAX Inference Engine

> Hi Alex, I'd like to take up on this bounty. Can you elaborate more on what needs to be done? Just the title and the commit are not the...

[BOUNTY - $500] JAX Inference Engine

> Hi @AlexCheema, > > I would like to take up this bounty. > > As I was looking into the existing implementations, I found that `tinygrad` only supports llama...

Enhancement: support local and custom models

Assigned @Sanchay-T