Eval working
Added facilities for processing examples (which currently consist of an input, a target, and a length) and a means of evaluating them against a (currently hard-defaulted) loss function in a distributed fashion across the shards of the exo network
A lot of the piping here will make it easier to do distributed training.
@blindcrone I put in some comments that I hope are useful. Note that these are all somewhat stylistic/superficial since I havent actually fetched or tested the branch on my machine.
Okay, this now in theory trains across nodes on MLX. I'll need to add the ability to save the weights somewhere to see how well it actually does, and it'd be nice to get tinygrad working this way too
Also I think the backprop approximation loss isn't exactly a correct approximation, so if anyone has a better suggestion based on remembering the backprop equations better please do educate