Thomas Capelle

Results 169 comments of Thomas Capelle

A good approach to do this is putting a param on the model like: ```python teacher_forcing_prob = 0 ... def forward(self, x, targets): if self.teacher_forcing_prob: replace the model input with...

Thanks! We can log everything tidily to W&B if you put any configuration manager.

Do you think it is possible to make the logs public @karpathy ?

I want to test the no-POE training and compare to this one, could you please make the project public @karpathy

Trying to reproduce this, but discovered that I am unable to tap into `mps` GPU anymore using accelerate...

I am curious if someone managed to run this on a laptop outside of the Ultras.

Yes, I am using the provided mistral example. It's not a typo it takes around 80 seconds to generate 1 token.

So, has someone managed to run a 7B inference using MLX on 16GB of RAM? Or do you need an Ultra to make any use of MLX?