Ali Ladjevardi
Ali Ladjevardi
Hi, I started working on this, my WIP PR is [#630 ](https://github.com/exo-explore/exo/pull/630). As @varshith15 pointed out, naively dequantizing weights before each forward is not performant, however this is not an...
I've found the Issue with sharding both model and data. it's in Embedding layer, it boils down to: ```python from tinygrad import Tensor, nn B, T = 4, 64 vocab_size...
@chenyuxyz yeah I was waiting for cast_before_view, now it's merged and I'm continuing work, will update soon. Update 1: - Embedding still has OOM issue, copy is still stuck after...
COPY after expand in Multi-GPU only happens in 1 case: ALU on two MULTI with different axis, where the source that ends up being copied is a result of an...
Also, I'm doing math in float32 right now, which adds overhead. when I change it to float16, I think something overflows and model outputs nothing. I will fix this.