unilm
unilm copied to clipboard
LayoutLMv3 extending language sequence length
I want to use LayoutLMv3 on full documents that have a text sequence length of more than 512. Is there a way to extend this and how should it be done?
Alternatively, could I split up the document into 2 sequences and forward them both with the image, or will this lose too much context?
@ChristiaensBert Yes, this is common practice.
I have trained LayoutLMv3 model with "bbox": Array2D(dtype="int64", shape=(512, 4)),
but documents have max boxes 928. So trained model is not predicting labels for all words(tokens).
I have tried to change value 512
by 1024
& 2048
but while training getting
ValueError: cannot reshape array of size 2048 into shape (1,1024,4)
So, Anyone know how to change config file and any idea to solve this problem
Hi @rusubbiz-muzkaq,
Did you try to find a way to work with lengths of more than 512 tokens on layoutLMV3?I am also getting the same error
Hi, i have the same problem as @rusubbiz-muzkaq and @jyotiyadav94 and haven't figured it out yet. Any updates?
Edit: https://github.com/NielsRogge/Transformers-Tutorials/issues/203
Hi all,
I got it working for a longer sequence length. See https://github.com/microsoft/unilm/issues/942#issuecomment-1429033957.
Thank you :)
Hi all! I have explained my solution to handle large tokens here, hope it can help you:
https://github.com/huggingface/transformers/issues/19190#issuecomment-1441883471