Sanj Ahilan

Results 4 comments of Sanj Ahilan

I've started building a nanoChatGPT as a fork of Karpathy's chatGPT. I also introduce a new idea for training by backpropagating through the reward function using the Gumbel-Softmax trick rather...

Ahh sorry, you also need to change: ``` run_experiments({ "MADDPG_RLLib": { "run": "contrib/MADDPG", }, ``` I've updated the PR but can't test this right now unfortunately, so let me know...

tfp is Tensorflow Probability which I think you just need to pip install. https://www.tensorflow.org/probability Hopefully that works. Also, if you are able to benchmark this code it would be great....

I have the same when training from scratch a small model. Loss is 0.136, val loss is 12.42. Generated text looks ok. EDIT: Openwebtext the losses look normal. My guess...