Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Try Supervised Fine-Tuning on pseudo-QA-data

Open yk opened this issue 2 years ago • 14 comments

The first step in InstructGPT (https://openai.com/blog/instruction-following/) is supervised fine-tuning on human instruction data. Our website and bot are being created to collect this data. Meanwhile, we can already try out whether and how it's possible to fine-tune LLMs on such data, by substituting the not-yet-collected data for pseudo-data. One idea is to take a QA dataset (like squad or natural questions) and convert this into instruction-response pairs, then running the fine-tuning on top of that to get a feel for the dynamics of training.

yk avatar Dec 21 '22 16:12 yk

I can help with this

ekurtulus avatar Dec 23 '22 14:12 ekurtulus

Amazing! Maybe it's best if you write down an initial roadmap here, like a step-by-step plan to reach the goal, then we can see better where potential problems might be!

yk avatar Dec 23 '22 17:12 yk

Here are the steps I am planning to take.

  1. Find a good pretrained model which is not that large (I believe a mid-sized T5 would be a nice choice)
  2. I will create two different variants of the SQUAD dataset, one with original labels and one with slightly corrupted labels (maybe back-translation ?)
  3. I will train the reward model, basically a linear layer with output dims=2 on top of our pretrained model
  4. Then, by using the reward model, I will try training the model with PPO

ekurtulus avatar Dec 26 '22 12:12 ekurtulus

Are we planning to use an encoder-decoder model like t5 or a decoder only model (or is this irrelevant at this stage)? I'm interested in understanding the choice of T5 vs a huggingface GPT2 implementation or OPT which has large pretrained models already available.

bth5032 avatar Dec 26 '22 22:12 bth5032

Are we planning to use an encoder-decoder model like t5 or a decoder only model (or is this irrelevant at this stage)? I'm interested in understanding the choice of T5 vs a huggingface GPT2 implementation or OPT which has large pretrained models already available.

architecturally, a decoder-only model is simpler, but an encoder-decoder system allows for explicit separation of input and output. ultimately, it will come down to the numbers which one we'd rather use. I feel a bit like decoder-only models shine in their simplicity and versatility, but I have no clue.

yk avatar Dec 26 '22 23:12 yk

The reason why I thought starting with T5 would be a good idea is that Flan-T5 outperforms OPT-IML. Also, once we have the training codebase available, we will be able to experiment with whatever architecture we want.

ekurtulus avatar Dec 27 '22 06:12 ekurtulus

The reason why I thought starting with T5 would be a good idea is that Flan-T5 outperforms OPT-IML. Also, once we have the training codebase available, we will be able to experiment with whatever architecture we want.

Oh very interesting, wasn't aware of those papers. That's pretty counterintuitive (to me) that t5 vs gpt architecture would have that marked difference w.r.t. parameter efficiency, will have to look into it further!

Thanks both for explaining :)

bth5032 avatar Dec 27 '22 08:12 bth5032

@yk @bth5032 Please note that there are a few recent works on generating instructions data automatically. The results that were reported by the authors are very promising. For example, the paper SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions states the following:

We introduce SELF-INSTRUCT, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off its own generations.

Our pipeline generates instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model.

Applying our method to vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on SuperNaturalInstructions, on par with the performance of InstructGPT-0011, which is trained with private user data and human annotations.

The generated dataset is supposed to be published here, but it's not there yet.

A similar work by other researchers is Tuning Language Models with (Almost) No Human Labor. They also report strong results and their dataset is here (https://github.com/orhonovich/unnatural-instructions).

I guess that this line of work could help us get started.

mrcabbage972 avatar Dec 29 '22 15:12 mrcabbage972

@mrcabbage972 thank you very much for the pointers. do you think you could make a PR, create a new markdown file under docs/research/ or something like this and start a collection of research works? The idea would be to have a place where we collect relevant papers, maybe a few tags for category, and a (short) description on what the paper could contribute to our efforts.

yk avatar Dec 30 '22 20:12 yk

Sure, I can do that.

mrcabbage972 avatar Dec 31 '22 04:12 mrcabbage972

@yk Please see PR here.

mrcabbage972 avatar Dec 31 '22 21:12 mrcabbage972

Partially resolved in PR.

sanagno avatar Jan 03 '23 21:01 sanagno

I am following up on your commit with the following:

  • I am adding a script for creating pseudo-data for sanity checks and mock trainings
  • Adding PolyLoss support which I heard that Palm team found to outperform CE
  • Adding Sharpness Aware Minimization support
  • Broader coverage of tasks
  • Further optimization and extensions

ekurtulus avatar Jan 08 '23 20:01 ekurtulus

Plese see this PR. I need a general feedback about the structure that I propose.

ekurtulus avatar Jan 09 '23 19:01 ekurtulus

@ekurtulus I'd like to help out on this issue. Any things you want me to look at? I guess once you create a new PR (replacing #576), we can go from there.

lakshaykc avatar Jan 11 '23 00:01 lakshaykc

@ekurtulus I'd like to help out on this issue. Any things you want me to look at? I guess once you create a new PR (replacing #576), we can go from there.

Great ! @sanagno also works on this. I think at some point, we need to create a general roadmap of the features we want until the end of this month. @yk opinion on this ?

ekurtulus avatar Jan 11 '23 08:01 ekurtulus

@ekurtulus there are a few things/plans on in the discord server. I am not sure if I have seen you there :))

sanagno avatar Jan 11 '23 09:01 sanagno

@ekurtulus there are a few things/plans on in the discord server. I am not sure if I have seen you there :))

Is it the ml-models channel ?

ekurtulus avatar Jan 11 '23 09:01 ekurtulus

Mostly yes, we can sync from there!

sanagno avatar Jan 11 '23 09:01 sanagno

@ekurtulus there are a few things/plans on in the discord server. I am not sure if I have seen you there :))

Is that Open-Assistant discord or LAION discord? I just see the 'Lobby' channel in Open-Assistant discord.

lakshaykc avatar Jan 11 '23 18:01 lakshaykc

@lakshaykc what is your discord name? I can give you more access I believe

sanagno avatar Jan 11 '23 18:01 sanagno

@lakshaykc what is your discord name? I can give you more access I believe

It is 'lkc'. Thanks.

lakshaykc avatar Jan 11 '23 18:01 lakshaykc

Please feel free to reopen this issue and ping me if the need for other tasks or models arise.

ekurtulus avatar Jan 11 '23 18:01 ekurtulus

Why was this closed? Is there a different issue related to this?

lakshaykc avatar Jan 11 '23 18:01 lakshaykc

The original purpose of the issue I think is served. See #619 for latest updates on the code. Once we get the human data we will continue on supervised fine-tuning on that. If you believe the discussion is better to continue here we can re-open this.

sanagno avatar Jan 11 '23 18:01 sanagno

Ok. I didn't see #619. Makes sense to just continue over there. We should update #200 as it mentions #48.

lakshaykc avatar Jan 11 '23 18:01 lakshaykc