Leandro von Werra

Results 155 comments of Leandro von Werra

Since this is maybe an issue of `fsdp` and `generate` can you create a minimal reproducible example of the error?

Hi @Symbolk Regarding question 1 & 3: I think there are two main reasons why the model performs worse than Codex: - We used considerably less compute. The model in...

You might find the insights in the AlphaCode paper interesting. They did train a decoder-only model on scratch on Python only and managed to match Codex's performance: They also did...

You could set the `init_kl_coeff=0` (see [here](https://github.com/lvwerra/trl/blob/750f5fd5329bb81c79b00243c4c8923ac14981d5/trl/ppo.py#L93)) to liberate the model from the reference completely or increase the KL target `target` (which is 6 by default).

No, I have not experimented much with these parameters. The main motivations for using input text at all is to force some variations in the generation. Yes, I suspect one...

Hi @yananchen1989, the simple code demo is just a proof of concept demo and I never used that config for the actual training. I did not run many experiments changing...

I think the fine-tuning is not a necessary step but improves stability and convergence. For the reward function, I don't see the point for a strictly positive reward. What would...

Thanks for raising the issue. Can confirm that this is indeed a bug. I'll look into it!

Not yet. Since jupyterplot wraps [python-lrcurve](https://github.com/AndreasMadsen/python-lrcurve) it would be worth checking if the issue persists there as well and if so raise the issue there at the source.

I think that makes sense. I have not used a seq2seq model, yet. So you might want to start with a decoder only model that should work and then compare...