Alex Cheema

Results 404 comments of Alex Cheema

Can you try running with `SUPPORT_BF16=0`

> Thanks for the reply! I assume that `SUPPORT_BF16=0` means smaller quantized weights accuracy, right? I gave it a shot, but it doesn't work. Maybe the problem stems from the...

Thanks for reporting this. Cool you have 5 Linux boxes set up. Can you run on one of the nodes with `DEBUG=6` and paste the logs here? This is definitely...

Does this work when you run on 2 of the CLANG boxes only? I can't reproduce on my end, wondering if it's some edge case when running with more nodes...

Currently exo does pipeline parallel inference which is faster compared to offloading when a single device can't fit the entire model. If a single device can fit the entire model,...

Try running with `SUPPORT_BF16=0` e.g. `SUPPORT_BF16=0 python3 main.py`. Can you let me know if that works? Ideally we detect this automatically.

The example was outdated. I've removed it now and added another example: https://github.com/exo-explore/exo/commit/5a9f4ba5c1f4b1258b91350714d6e68b6195eb8a I see you're using conda, in which case you may need to follow the conda instructions here...

> Hi Alex, I'd like to take up on this bounty. Can you elaborate more on what needs to be done? Just the title and the commit are not the...

> Hi @AlexCheema, > > I would like to take up this bounty. > > As I was looking into the existing implementations, I found that `tinygrad` only supports llama...