ParlAI Question about replicating bert ranker

Hi, Currently, I try to pretrain bert-ranker-model of blenderbot 2.0 using to convai2(personachat dataset, version both_original ) which is the purpose of replicating the results.

My script

!python -m parlai.scripts.train_model -t convai2:both -m bert_ranker/bi_encoder_ranker --batchsize 20 -vtim 30 \
--init-model zoo:bert/model \
--model-file content/drive/MyDrive/bi_encoder --data-parallel True

Q1 Train_accuary is weird. The link said that it accuracy is 0.8686, but I am getting strange output as 0.06167. Can I simply train more? I'm not sure what mistake I made. Could you please help?

06:57:42 | time:3765s total_exs:34820 total_steps:1741 epochs:0.26
    clen  : 136, clip : 1, ctpb : 2751, ctps : 7214, ctrunc  : .008333   
    ctrunclen : .4633, exps : 52.44, exs : 600, gnorm : 47.02, gpu_mem : .6626   
    llen : 14.08, lr : 5e-05, ltpb : 281.6, ltps : 738.5, ltrunc : 0, ltrunclen : 0
    mean_loss : 3.738, mrr : .1876, rank : 10.45, total_train_updates : 1741
    train_accuracy :  .06167, tpb : 3033, tps : 7952, ups : 2.623

Q2 In order to make bb2's bi-encoder, is it only necessary to fine-tune the base bert with convai2:normalized? And is the script code I wrote for doing that right?
Q3 Would the result be much different if I use convai2 both version instead of self version? I saw #4581 and found out which version of bb2-ranker-bert was used in this issue.

Thank you for your kindly reply.

Jul 19 '22 07:07 daje0601

Q1: You've specified the incorrect --init-model --> try doing the following:

--init-model zoo:pretrained_transformers/bi_model_huge_reddit/model -m transformer/ranker

q2: I'm not sure I fully understand - BB2 is a generative model, and does not employ BERT

q3: It's up to you - both may yield slightly better results but we generally use self (as it more accurately reflects real applications)

Jul 21 '22 17:07 klshuster

Thank you for your kindle answer. I fully understood Q1 and Q3.

I'm not familiar with English, so the question was not accurate.

Question 2 means that please review whether the dataset required to pretrain the bert marked with the orange box is convai2 or reddit & convai2.

Jul 22 '22 02:07 daje0601

We use a DPR model that is pre-trained on a suite of knowledge-intensive QA tasks, and is fine-tuned in a RAG setup on wizard of wikipedia. so, no reddit training involved

Jul 25 '22 16:07 klshuster

Um,, Am I understanding correctly?

Blenderbot2

Ready

No	component	pre-training	model	dataset
1	query generator	O	BART	Wizard_of_Internet(wizlnt)
2	Long Term memory reader	O	bert	Wizard_of_Wikipedia(WoW)
3	Long Term memory writer	X	bert	X
4	Long Term memory decoder	X	gpt2	X

fine-tuning : Blenderbot2(MSC, Wizlnt, safety(BAD)

Blenderbot1
- Ready
No component pre-training model dataset

1 Poly-Encoder O transformers encoder Reddit(~2017.09, comment)
- fine-tuning : Blenderbot1(BST,WoW,convai2:normalized,empathetic_dialogues)

No	component	pre-training	model	dataset
1	Poly-Encoder	O	transformers encoder	Reddit(~2017.09, comment)

I read almost all the issues, and there were a lot of questions asking about the bb2 component, structure, required dataset, and script command. I want to help you. I made the table about bb2 and bb1 so you get less questions like this. I hope I understand correctly and this table is helpful for those new to blenderbot. Thank you so much for always providing good papers and open sources.

sincerely Thank you!

Jul 26 '22 03:07 daje0601

BlenderBot 1 is an encoder-decoder transformer, not a poly-encoder

BlenderBot 2's long term memory decoder is also BART and was trained on MSC. everything else looks to be correct

Aug 04 '22 21:08 klshuster

I made a terrible mistake.

Blenderbot2

No	component	pre-training	model	dataset
1	query generator	O	BART	Wizard_of_Internet(wizlnt)
2	Long Term memory reader	O	bert	Wizard_of_Wikipedia(WoW)
3	Long Term memory writer	X	bert	X
4	Long Term memory decoder	X	bart	X

Blenderbot1

No	component	pre-training	model	dataset
1	Poly-Encoder, bart(decoder)	O	customized transformers	Reddit(~2017.09, comment)

Now I have some understanding of this project. Every time I see you, I feel great. Thank you so much..!!

Aug 05 '22 01:08 daje0601

BlenderBot 1 does not use BART, it uses a different architecture with a different pre-training objective; the original paper outlines in depth what these are. it's simply a transformer seq2seq model

Aug 08 '22 22:08 klshuster

BlenderBot 1 does not use BART, it uses a different architecture with a different pre-training objective; the original paper outlines in depth what these are. it's simply a transformer seq2seq model

With this structure, I understood, and I wanted to talk about this structure. Maybe I misunderstood?

Aug 10 '22 10:08 daje0601

That is the retrieve and refine architecture; blenderbot 1 is a purely generative model, however we considered retrieve and refine in the original tech report as an alternative

Aug 16 '22 14:08 klshuster

Q1 Isn't Reddit_3B a pre-trained model with reddit data as a poly-encoder?
Q2 This picture is Blenderbot1 fine-tuning script command when fine-tuning Reddit_3B/models are included. Because I thought bb1 was a polyencoder+transformer because the reddit model was included in the fine-tuning. By the way, I am very confused because you said that it is a purely generavtive model. Sorry for the inconvenience, but please mercy on me.

Aug 18 '22 01:08 daje0601

bb1 was pre-trained on reddit as well, as a generative language model

Aug 18 '22 18:08 klshuster

bb1 was pre-trained on reddit as well, as a generative language model

Thank you for your reply seriously.

Aug 19 '22 01:08 daje0601