Leandro von Werra
Leandro von Werra
Since this is maybe an issue of `fsdp` and `generate` can you create a minimal reproducible example of the error?
Hi @Symbolk Regarding question 1 & 3: I think there are two main reasons why the model performs worse than Codex: - We used considerably less compute. The model in...
You might find the insights in the AlphaCode paper interesting. They did train a decoder-only model on scratch on Python only and managed to match Codex's performance: They also did...
You could set the `init_kl_coeff=0` (see [here](https://github.com/lvwerra/trl/blob/750f5fd5329bb81c79b00243c4c8923ac14981d5/trl/ppo.py#L93)) to liberate the model from the reference completely or increase the KL target `target` (which is 6 by default).
No, I have not experimented much with these parameters. The main motivations for using input text at all is to force some variations in the generation. Yes, I suspect one...
Hi @yananchen1989, the simple code demo is just a proof of concept demo and I never used that config for the actual training. I did not run many experiments changing...
I think the fine-tuning is not a necessary step but improves stability and convergence. For the reward function, I don't see the point for a strictly positive reward. What would...
Thanks for raising the issue. Can confirm that this is indeed a bug. I'll look into it!
Not yet. Since jupyterplot wraps [python-lrcurve](https://github.com/AndreasMadsen/python-lrcurve) it would be worth checking if the issue persists there as well and if so raise the issue there at the source.
I think that makes sense. I have not used a seq2seq model, yet. So you might want to start with a decoder only model that should work and then compare...