Thomas Capelle comments

Results 169 comments of


                                            Thomas Capelle

Try Teacher Forcing for RNNs

A good approach to do this is putting a param on the model like: ```python teacher_forcing_prob = 0 ... def forward(self, x, targets): if self.teacher_forcing_prob: replace the model input with...

Add wandb logging

Thanks =)

Log the config params to wandb

Thanks! We can log everything tidily to W&B if you put any configuration manager.

Make wandb training logs public

Do you think it is possible to make the logs public @karpathy ?

Make wandb training logs public

I want to test the no-POE training and compare to this one, could you please make the project public @karpathy

Compile gives weird error about CUDA on MPS device (apple Macbook)

cc @msaroufim

accelerator.end_training() is generating exception when wandb is being used as tracker

Trying to reproduce this, but discovered that I am unable to tap into `mps` GPU anymore using accelerate...

What is the Expected Inference Performance

I am curious if someone managed to run this on a laptop outside of the Ultras.

What is the Expected Inference Performance

Yes, I am using the provided mistral example. It's not a typo it takes around 80 seconds to generate 1 token.

What is the Expected Inference Performance

So, has someone managed to run a 7B inference using MLX on 16GB of RAM? Or do you need an Ultra to make any use of MLX?