Drop GPT2 in our test in favour of a more recent instruct model
Initially suggested here https://github.com/huggingface/trl/pull/2144#discussion_r1787905943
It sounds like a cool first issue. I'd be up for working on it to explore the TRL project structure a bit more.
Thanks for the proposal! Feel free to open a pull request, and we'll assist you. 😊
@qgallouedec
it's mostly dummy-GPT2-correct-vocab
Do you want to replace those too?
Yes. We could try using Qwen2.5-0.5B-Instruct directly. Although it's a much larger model, it might still work fine, as long as it doesn't slow down or disrupt the tests. If it does, we'll need to create our own tiny version of it.
Hey @qgallouedec , I just ran some tests and my biggest issue was the good old Out of Memory error when running it with Qwen2.5-0.5B-Instruct locally.
Is there any benefit in using a larger model or would it also be feasible to use a smaller instruction tuned model, like HuggingFaceTB/SmolLM-135M-Instruct? A small model like that would probably also not create a lot of slowdown and is easier to test on most local machines.
Sorry for the late reply. No, there isn't any benefit of running the tests with larger models. Ideally we want to use tiny models for testing.
Hey @qgallouedec, just wanted to let you know that I'm still working on it. One more question: Should the gpt2 model only be replaced in actual trainer tests, i.e. cpo, bco, dpo etc. or in all the tests (it also occurs in e.g. callback tests)?
Hi, as I didn't see any PR I thought you hadn't had time to deal with it and I went ahead on my own :( I opted for a deeper rebuild, regenerating all the tiny models. I'll let you have a look at #2287. I'm always interested in feedback, though.
Hi, as I didn't see any PR I thought you hadn't had time to deal with it and I went ahead on my own :( I opted for a deeper rebuild, regenerating all the tiny models. I'll let you have a look at #2287. I'm always interested in feedback, though.
No problem. I had less time than I initially hoped and also didn't really update you, so no worries. It is actually interesting for me to see how you are solving this ;)