trl icon indicating copy to clipboard operation
trl copied to clipboard

Drop GPT2 in our test in favour of a more recent instruct model

Open qgallouedec opened this issue 1 year ago • 7 comments

Initially suggested here https://github.com/huggingface/trl/pull/2144#discussion_r1787905943

qgallouedec avatar Oct 04 '24 15:10 qgallouedec

It sounds like a cool first issue. I'd be up for working on it to explore the TRL project structure a bit more.

johko avatar Oct 07 '24 20:10 johko

Thanks for the proposal! Feel free to open a pull request, and we'll assist you. 😊

qgallouedec avatar Oct 08 '24 08:10 qgallouedec

@qgallouedec it's mostly dummy-GPT2-correct-vocab Do you want to replace those too?

August-murr avatar Oct 10 '24 08:10 August-murr

Yes. We could try using Qwen2.5-0.5B-Instruct directly. Although it's a much larger model, it might still work fine, as long as it doesn't slow down or disrupt the tests. If it does, we'll need to create our own tiny version of it.

qgallouedec avatar Oct 10 '24 09:10 qgallouedec

Hey @qgallouedec , I just ran some tests and my biggest issue was the good old Out of Memory error when running it with Qwen2.5-0.5B-Instruct locally.

Is there any benefit in using a larger model or would it also be feasible to use a smaller instruction tuned model, like HuggingFaceTB/SmolLM-135M-Instruct? A small model like that would probably also not create a lot of slowdown and is easier to test on most local machines.

johko avatar Oct 10 '24 20:10 johko

Sorry for the late reply. No, there isn't any benefit of running the tests with larger models. Ideally we want to use tiny models for testing.

qgallouedec avatar Oct 17 '24 07:10 qgallouedec

Hey @qgallouedec, just wanted to let you know that I'm still working on it. One more question: Should the gpt2 model only be replaced in actual trainer tests, i.e. cpo, bco, dpo etc. or in all the tests (it also occurs in e.g. callback tests)?

johko avatar Oct 26 '24 20:10 johko

Hi, as I didn't see any PR I thought you hadn't had time to deal with it and I went ahead on my own :( I opted for a deeper rebuild, regenerating all the tiny models. I'll let you have a look at #2287. I'm always interested in feedback, though.

qgallouedec avatar Oct 27 '24 17:10 qgallouedec

Hi, as I didn't see any PR I thought you hadn't had time to deal with it and I went ahead on my own :( I opted for a deeper rebuild, regenerating all the tiny models. I'll let you have a look at #2287. I'm always interested in feedback, though.

No problem. I had less time than I initially hoped and also didn't really update you, so no worries. It is actually interesting for me to see how you are solving this ;)

johko avatar Oct 27 '24 20:10 johko