unilm
unilm copied to clipboard
questions about pre-training of markuplm
Describe Model I am using (MarkupLM)
I have some questions about pre-training of MarkupLM.
- there are many webpages with long text. how did you handle the long pages?
- I wonder how the web page node preprocessed when it exceeds the maximum depth.
@skygl 1. For long documents, we use the same pre-processing as LayoutLM which we split the documents into blocks with a length of 512. 2. Just trim the deeper nodes.