unilm icon indicating copy to clipboard operation
unilm copied to clipboard

Question for BeiT3 about pre-training

Open ShiJiawenwen opened this issue 1 year ago • 1 comments

Thank you very much for your work. Recently, I have been studying the BEiT3 model, but I have some questions and I would appreciate it if I could get some answers:

  1. Is the training dataset shuffled during the training phase? Does it include both image, text, and image-text pairs?
  2. If the dataset is mixed, how are different experts selected during the training process? The MoE structure typically includes a gating network, but it seems that this model does not have such a gating network. I look forward to receiving your response.

ShiJiawenwen avatar Jun 27 '23 08:06 ShiJiawenwen

When users fine-tune this model for different downstream tasks, do they need to select experts?

ShiJiawenwen avatar Jun 27 '23 09:06 ShiJiawenwen