unilm icon indicating copy to clipboard operation
unilm copied to clipboard

questions about pre-training of markuplm

Open skygl opened this issue 3 years ago • 1 comments

Describe Model I am using (MarkupLM)

I have some questions about pre-training of MarkupLM.

  1. there are many webpages with long text. how did you handle the long pages?
  2. I wonder how the web page node preprocessed when it exceeds the maximum depth.

skygl avatar Jul 26 '22 11:07 skygl

@skygl 1. For long documents, we use the same pre-processing as LayoutLM which we split the documents into blocks with a length of 512. 2. Just trim the deeper nodes.

wolfshow avatar Aug 16 '22 03:08 wolfshow