Alex Cheema
Alex Cheema
Can you try running with `SUPPORT_BF16=0`
> Thanks for the reply! I assume that `SUPPORT_BF16=0` means smaller quantized weights accuracy, right? I gave it a shot, but it doesn't work. Maybe the problem stems from the...
Thanks for reporting this. Cool you have 5 Linux boxes set up. Can you run on one of the nodes with `DEBUG=6` and paste the logs here? This is definitely...
Does this work when you run on 2 of the CLANG boxes only? I can't reproduce on my end, wondering if it's some edge case when running with more nodes...
Currently exo does pipeline parallel inference which is faster compared to offloading when a single device can't fit the entire model. If a single device can fit the entire model,...
Try running with `SUPPORT_BF16=0` e.g. `SUPPORT_BF16=0 python3 main.py`. Can you let me know if that works? Ideally we detect this automatically.
The example was outdated. I've removed it now and added another example: https://github.com/exo-explore/exo/commit/5a9f4ba5c1f4b1258b91350714d6e68b6195eb8a I see you're using conda, in which case you may need to follow the conda instructions here...
> Hi Alex, I'd like to take up on this bounty. Can you elaborate more on what needs to be done? Just the title and the commit are not the...
> Hi @AlexCheema, > > I would like to take up this bounty. > > As I was looking into the existing implementations, I found that `tinygrad` only supports llama...
Assigned @Sanchay-T