Thomas Capelle
Thomas Capelle
A good approach to do this is putting a param on the model like: ```python teacher_forcing_prob = 0 ... def forward(self, x, targets): if self.teacher_forcing_prob: replace the model input with...
Thanks =)
Thanks! We can log everything tidily to W&B if you put any configuration manager.
Do you think it is possible to make the logs public @karpathy ?
I want to test the no-POE training and compare to this one, could you please make the project public @karpathy
Trying to reproduce this, but discovered that I am unable to tap into `mps` GPU anymore using accelerate...
I am curious if someone managed to run this on a laptop outside of the Ultras.
Yes, I am using the provided mistral example. It's not a typo it takes around 80 seconds to generate 1 token.
So, has someone managed to run a 7B inference using MLX on 16GB of RAM? Or do you need an Ultra to make any use of MLX?