ParlAI icon indicating copy to clipboard operation
ParlAI copied to clipboard

Question about replicating bert ranker

Open daje0601 opened this issue 3 years ago • 8 comments

Hi, Currently, I try to pretrain bert-ranker-model of blenderbot 2.0 using to convai2(personachat dataset, version both_original ) which is the purpose of replicating the results.

  • My script
!python -m parlai.scripts.train_model -t convai2:both -m bert_ranker/bi_encoder_ranker --batchsize 20 -vtim 30 \
--init-model zoo:bert/model \
--model-file content/drive/MyDrive/bi_encoder --data-parallel True
  • Q1 Train_accuary is weird. The link said that it accuracy is 0.8686, but I am getting strange output as 0.06167. Can I simply train more? I'm not sure what mistake I made. Could you please help?
06:57:42 | time:3765s total_exs:34820 total_steps:1741 epochs:0.26
    clen  : 136, clip : 1, ctpb : 2751, ctps : 7214, ctrunc  : .008333   
    ctrunclen : .4633, exps : 52.44, exs : 600, gnorm : 47.02, gpu_mem : .6626   
    llen : 14.08, lr : 5e-05, ltpb : 281.6, ltps : 738.5, ltrunc : 0, ltrunclen : 0
    mean_loss : 3.738, mrr : .1876, rank : 10.45, total_train_updates : 1741
    train_accuracy :  .06167, tpb : 3033, tps : 7952, ups : 2.623
  • Q2 In order to make bb2's bi-encoder, is it only necessary to fine-tune the base bert with convai2:normalized? And is the script code I wrote for doing that right?

  • Q3 Would the result be much different if I use convai2 both version instead of self version? I saw #4581 and found out which version of bb2-ranker-bert was used in this issue.

Thank you for your kindly reply.

daje0601 avatar Jul 19 '22 07:07 daje0601

Q1: You've specified the incorrect --init-model --> try doing the following:

--init-model zoo:pretrained_transformers/bi_model_huge_reddit/model -m transformer/ranker

q2: I'm not sure I fully understand - BB2 is a generative model, and does not employ BERT

q3: It's up to you - both may yield slightly better results but we generally use self (as it more accurately reflects real applications)

klshuster avatar Jul 21 '22 17:07 klshuster

Thank you for your kindle answer. I fully understood Q1 and Q3.

I'm not familiar with English, so the question was not accurate.

Question 2 means that please review whether the dataset required to pretrain the bert marked with the orange box is convai2 or reddit & convai2. image

daje0601 avatar Jul 22 '22 02:07 daje0601

We use a DPR model that is pre-trained on a suite of knowledge-intensive QA tasks, and is fine-tuned in a RAG setup on wizard of wikipedia. so, no reddit training involved

klshuster avatar Jul 25 '22 16:07 klshuster

Um,, Am I understanding correctly?

  • Blenderbot2

    • Ready
    No component pre-training model dataset
    1 query generator O BART Wizard_of_Internet(wizlnt)
    2 Long Term memory reader O bert Wizard_of_Wikipedia(WoW)
    3 Long Term memory writer X bert X
    4 Long Term memory decoder X gpt2 X
    • fine-tuning : Blenderbot2(MSC, Wizlnt, safety(BAD)
  • Blenderbot1

    • Ready
    No component pre-training model dataset
    1 Poly-Encoder O transformers encoder Reddit(~2017.09, comment)
    • fine-tuning : Blenderbot1(BST,WoW,convai2:normalized,empathetic_dialogues)

I read almost all the issues, and there were a lot of questions asking about the bb2 component, structure, required dataset, and script command. I want to help you. I made the table about bb2 and bb1 so you get less questions like this. I hope I understand correctly and this table is helpful for those new to blenderbot. Thank you so much for always providing good papers and open sources.

sincerely Thank you!

daje0601 avatar Jul 26 '22 03:07 daje0601

BlenderBot 1 is an encoder-decoder transformer, not a poly-encoder

BlenderBot 2's long term memory decoder is also BART and was trained on MSC. everything else looks to be correct

klshuster avatar Aug 04 '22 21:08 klshuster

I made a terrible mistake.

  • Blenderbot2
No component pre-training model dataset
1 query generator O BART Wizard_of_Internet(wizlnt)
2 Long Term memory reader O bert Wizard_of_Wikipedia(WoW)
3 Long Term memory writer X bert X
4 Long Term memory decoder X bart X
  • Blenderbot1
No component pre-training model dataset
1 Poly-Encoder, bart(decoder) O customized transformers Reddit(~2017.09, comment)

Now I have some understanding of this project. Every time I see you, I feel great. Thank you so much..!!

daje0601 avatar Aug 05 '22 01:08 daje0601

BlenderBot 1 does not use BART, it uses a different architecture with a different pre-training objective; the original paper outlines in depth what these are. it's simply a transformer seq2seq model

klshuster avatar Aug 08 '22 22:08 klshuster

BlenderBot 1 does not use BART, it uses a different architecture with a different pre-training objective; the original paper outlines in depth what these are. it's simply a transformer seq2seq model

With this structure, I understood, and I wanted to talk about this structure. Maybe I misunderstood? image

daje0601 avatar Aug 10 '22 10:08 daje0601

That is the retrieve and refine architecture; blenderbot 1 is a purely generative model, however we considered retrieve and refine in the original tech report as an alternative

klshuster avatar Aug 16 '22 14:08 klshuster

  • Q1 Isn't Reddit_3B a pre-trained model with reddit data as a poly-encoder?

  • Q2 image This picture is Blenderbot1 fine-tuning script command when fine-tuning Reddit_3B/models are included. Because I thought bb1 was a polyencoder+transformer because the reddit model was included in the fine-tuning. By the way, I am very confused because you said that it is a purely generavtive model. Sorry for the inconvenience, but please mercy on me.

daje0601 avatar Aug 18 '22 01:08 daje0601

bb1 was pre-trained on reddit as well, as a generative language model

klshuster avatar Aug 18 '22 18:08 klshuster

bb1 was pre-trained on reddit as well, as a generative language model

Thank you for your reply seriously.

daje0601 avatar Aug 19 '22 01:08 daje0601