Open-Assistant
Open-Assistant copied to clipboard
Try Supervised Fine-Tuning on pseudo-QA-data
The first step in InstructGPT (https://openai.com/blog/instruction-following/) is supervised fine-tuning on human instruction data. Our website and bot are being created to collect this data. Meanwhile, we can already try out whether and how it's possible to fine-tune LLMs on such data, by substituting the not-yet-collected data for pseudo-data. One idea is to take a QA dataset (like squad or natural questions) and convert this into instruction-response pairs, then running the fine-tuning on top of that to get a feel for the dynamics of training.
I can help with this
Amazing! Maybe it's best if you write down an initial roadmap here, like a step-by-step plan to reach the goal, then we can see better where potential problems might be!
Here are the steps I am planning to take.
- Find a good pretrained model which is not that large (I believe a mid-sized T5 would be a nice choice)
- I will create two different variants of the SQUAD dataset, one with original labels and one with slightly corrupted labels (maybe back-translation ?)
- I will train the reward model, basically a linear layer with output dims=2 on top of our pretrained model
- Then, by using the reward model, I will try training the model with PPO
Are we planning to use an encoder-decoder model like t5 or a decoder only model (or is this irrelevant at this stage)? I'm interested in understanding the choice of T5 vs a huggingface GPT2 implementation or OPT which has large pretrained models already available.
Are we planning to use an encoder-decoder model like t5 or a decoder only model (or is this irrelevant at this stage)? I'm interested in understanding the choice of T5 vs a huggingface GPT2 implementation or OPT which has large pretrained models already available.
architecturally, a decoder-only model is simpler, but an encoder-decoder system allows for explicit separation of input and output. ultimately, it will come down to the numbers which one we'd rather use. I feel a bit like decoder-only models shine in their simplicity and versatility, but I have no clue.
The reason why I thought starting with T5 would be a good idea is that Flan-T5 outperforms OPT-IML. Also, once we have the training codebase available, we will be able to experiment with whatever architecture we want.
The reason why I thought starting with T5 would be a good idea is that Flan-T5 outperforms OPT-IML. Also, once we have the training codebase available, we will be able to experiment with whatever architecture we want.
Oh very interesting, wasn't aware of those papers. That's pretty counterintuitive (to me) that t5 vs gpt architecture would have that marked difference w.r.t. parameter efficiency, will have to look into it further!
Thanks both for explaining :)
@yk @bth5032 Please note that there are a few recent works on generating instructions data automatically. The results that were reported by the authors are very promising. For example, the paper SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions states the following:
We introduce SELF-INSTRUCT, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off its own generations.
Our pipeline generates instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model.
Applying our method to vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on SuperNaturalInstructions, on par with the performance of InstructGPT-0011, which is trained with private user data and human annotations.
The generated dataset is supposed to be published here, but it's not there yet.
A similar work by other researchers is Tuning Language Models with (Almost) No Human Labor. They also report strong results and their dataset is here (https://github.com/orhonovich/unnatural-instructions).
I guess that this line of work could help us get started.
@mrcabbage972 thank you very much for the pointers. do you think you could make a PR, create a new markdown file under docs/research/
or something like this and start a collection of research works? The idea would be to have a place where we collect relevant papers, maybe a few tags for category, and a (short) description on what the paper could contribute to our efforts.
Sure, I can do that.
@yk Please see PR here.
Partially resolved in PR.
I am following up on your commit with the following:
- I am adding a script for creating pseudo-data for sanity checks and mock trainings
- Adding PolyLoss support which I heard that Palm team found to outperform CE
- Adding Sharpness Aware Minimization support
- Broader coverage of tasks
- Further optimization and extensions
Plese see this PR. I need a general feedback about the structure that I propose.
@ekurtulus I'd like to help out on this issue. Any things you want me to look at? I guess once you create a new PR (replacing #576), we can go from there.
@ekurtulus I'd like to help out on this issue. Any things you want me to look at? I guess once you create a new PR (replacing #576), we can go from there.
Great ! @sanagno also works on this. I think at some point, we need to create a general roadmap of the features we want until the end of this month. @yk opinion on this ?
@ekurtulus there are a few things/plans on in the discord server. I am not sure if I have seen you there :))
@ekurtulus there are a few things/plans on in the discord server. I am not sure if I have seen you there :))
Is it the ml-models channel ?
Mostly yes, we can sync from there!
@ekurtulus there are a few things/plans on in the discord server. I am not sure if I have seen you there :))
Is that Open-Assistant discord or LAION discord? I just see the 'Lobby' channel in Open-Assistant discord.
@lakshaykc what is your discord name? I can give you more access I believe
@lakshaykc what is your discord name? I can give you more access I believe
It is 'lkc'. Thanks.
Please feel free to reopen this issue and ping me if the need for other tasks or models arise.
Why was this closed? Is there a different issue related to this?
The original purpose of the issue I think is served. See #619 for latest updates on the code. Once we get the human data we will continue on supervised fine-tuning on that. If you believe the discussion is better to continue here we can re-open this.
Ok. I didn't see #619. Makes sense to just continue over there. We should update #200 as it mentions #48.