baize-chatbot icon indicating copy to clipboard operation
baize-chatbot copied to clipboard

SDF, how does it work?

Open qrdlgit opened this issue 2 years ago • 5 comments

I read the technical report, but there wasn't much info about the SDF. How does it work?

Is the intention to release a more detailed paper soon or are you folks considering keeping this as closed?

qrdlgit avatar May 24 '23 01:05 qrdlgit

We'll release the code soon. It's actually very simple. Just ask ChatGPT to pick the best response and use that to fine-tune Baize.

JetRunner avatar May 24 '23 21:05 JetRunner

Is that really self-distillation? It sounds more like synthetic data generation and you're still distilling ChatGPT into the model.

Don't get me wrong, it's a great approach and lets people distill with much less contamination, but I think the title is a bit confusing and burying the lead.

qrdlgit avatar May 25 '23 01:05 qrdlgit

Is that really self-distillation? It sounds more like synthetic data generation and you're still distilling ChatGPT into the model.

Don't get me wrong, it's a great approach and lets people distill with much less contamination, but I think the title is a bit confusing and burying the lead.

But for SDF, all the four responses are generated by the Baize model itself? ChatGPT only helps to choose which one to use. That's why we call it "self-distillation with feedback".

JetRunner avatar May 25 '23 03:05 JetRunner

Right, but it's the intelligence of ChatGPT you're distilling into your model.

If a child learning math gives 4 answers to a math question, and then a teacher tells him what's correct, you would not say the child is self-learning.

Note, this isn't to criticize. What you've done is hyper cool. I just think it might be made more clear.

And for what its worth, I regularly spam your project in a bunch of different places. You folks are doing some of the coolest things, imho.

qrdlgit avatar May 25 '23 03:05 qrdlgit

Right, but it's the intelligence of ChatGPT you're distilling into your model.

If a child learning math gives 4 answers to a math question, and then a teacher tells him what's correct, you would not say the child is self-learning.

Note, this isn't to criticize. What you've done is hyper cool. I just think it might be made more clear.

None taken. You're right because that's indeed our motivation - besides SFT, to find another way to learn from ChatGPT. Also it can be substituted with human preferences. We'll think again about the name but we may not be able to update it due to EMNLP anonymity period. Thanks for your comments!

JetRunner avatar May 25 '23 03:05 JetRunner