unilm
unilm copied to clipboard
LayoutLMv3 | Domain adaptation on the base model
I'm using the base model from LayoutLMv3 and trying to adapt it to my own local data. This data is unlabeled, so I'm trying to continue the training of the base model on my own data. I'm having trouble adapting on how to mask the data and which collator to give to the Trainer. Currently my data has this structure:
features = Features({
'input_ids': Sequence(feature=Value(dtype='int64')),
'attention_mask': Sequence(Value(dtype='int64')),
'bbox': Array2D(dtype="int64", shape=(512, 4)),
'pixel_values': Array3D(dtype="float32", shape=(3, 224, 224)),
})
To mask the text part I'm using the DataCollatorForLanguageModeling but this only masks the text and doesn't include the image information. Anyone that knows how to do this?
You may refer to LayoutLMv3's paper and BEiT's code for image masking (the Masked Image Modeling objective).