Eval working

Open blindcrone opened this issue 1 year ago • 2 comments

Added facilities for processing examples (which currently consist of an input, a target, and a length) and a means of evaluating them against a (currently hard-defaulted) loss function in a distributed fashion across the shards of the exo network

A lot of the piping here will make it easier to do distributed training.

Nov 15 '24 11:11 blindcrone

@blindcrone I put in some comments that I hope are useful. Note that these are all somewhat stylistic/superficial since I havent actually fetched or tested the branch on my machine.

Nov 15 '24 14:11 dtnewman

Okay, this now in theory trains across nodes on MLX. I'll need to add the ability to save the weights somewhere to see how well it actually does, and it'd be nice to get tinygrad working this way too

Also I think the backprop approximation loss isn't exactly a correct approximation, so if anyone has a better suggestion based on remembering the backprop equations better please do educate

Nov 19 '24 16:11 blindcrone