ALBEF icon indicating copy to clipboard operation
ALBEF copied to clipboard

Architecture of ALBEF

Open Asaad-Pak opened this issue 6 months ago • 3 comments

Hello I would like to do some experiments using ALBEF model. For this I reviewed your paper as well, but I am unable to understand why first six layers of bert base was used as text encoder and why last six layers are used as multimodal encoder? Why didn't the entire BERT_base with all 12 layers was used as text encoder and multimodal encoder? Your help in this regard would be greatly appreciated. @LiJunnan1992 @svc-scm @chenxwh

Asaad-Pak avatar Aug 21 '24 14:08 Asaad-Pak