unilm Question for BeiT3 about pre-training

Question for BeiT3 about pre-training

Open ShiJiawenwen opened this issue 1 year ago • 1 comments

Thank you very much for your work. Recently, I have been studying the BEiT3 model, but I have some questions and I would appreciate it if I could get some answers:

Is the training dataset shuffled during the training phase? Does it include both image, text, and image-text pairs?
If the dataset is mixed, how are different experts selected during the training process? The MoE structure typically includes a gating network, but it seems that this model does not have such a gating network. I look forward to receiving your response.

Jun 27 '23 08:06 ShiJiawenwen

When users fine-tune this model for different downstream tasks, do they need to select experts?

Jun 27 '23 09:06 ShiJiawenwen

unilm unilm copied to clipboard

Question for BeiT3 about pre-training

unilm
unilm copied to clipboard