unilm Loss trends of Pretrain LayoutLMv3

Describe

Model I am using (UniLM, MiniLM, LayoutLM ...): LaoyutLMv3

I pre-trained LayoutLMv3 base model, it seems convergence begins at high MLM/MIM loss value. MLM/MIM loss convergence begins at about 6～7, WPA loss convergence at 0.5. So Total loss is about 12～15. These loss values seem high relative to normal MLM losses, I think it is strange. Actually, inference with above model, word parts does not become sentence, and image parts is mostly white image with some noise.

Can you provide log of pre-training on LayoutLMv3 base model?

There is one more thing that I find strange, when learning MLM with span masking, it seems that top1 cannot be restored for sentences after the input length of the data after masking. (It was when I tried overtrained with small data. and length of attention_mask is aligned with the data before masking.)

for example:

original data        : A B C D E F
masked data          : A [MASK] F
model output (top 1) : A B C F F F

↑ when masked data length is 3, model output can only inference 3 tokens.

I would like to check whether the above behavior is correct.

Conditions:

Input length to the transformer is about 709 that contains word(512) and image(197)
- batch size per GPU: about 50
- using gradient accumulation: 10=> 50 x 4 x 10 =2,000

Dec 20 '22 04:12 kash203

Hi, where is the code for pre-training?

Dec 29 '22 12:12 yash0307

Hi @yash0307, I think it is not released, so I'm implementing it myself by referring to papers. So I want to check the loss trends, just like checking answers.

Dec 30 '22 10:12 kash203

hello, could you please make your code public

Jan 17 '23 08:01 vanpersie32

Hi, could you please give me some information such as: number of images, the training time, ...?

Feb 07 '23 06:02 hieutt196

unilm unilm copied to clipboard

Loss trends of Pretrain LayoutLMv3

Describe

Conditions:

unilm
unilm copied to clipboard