unilm icon indicating copy to clipboard operation
unilm copied to clipboard

LayoutLMv3 extending language sequence length

Open ChristiaensBert opened this issue 2 years ago • 6 comments

I want to use LayoutLMv3 on full documents that have a text sequence length of more than 512. Is there a way to extend this and how should it be done?

Alternatively, could I split up the document into 2 sequences and forward them both with the image, or will this lose too much context?

ChristiaensBert avatar Jun 08 '22 12:06 ChristiaensBert

@ChristiaensBert Yes, this is common practice.

wolfshow avatar Jun 09 '22 06:06 wolfshow

I have trained LayoutLMv3 model with "bbox": Array2D(dtype="int64", shape=(512, 4)), but documents have max boxes 928. So trained model is not predicting labels for all words(tokens).

I have tried to change value 512 by 1024 & 2048 but while training getting ValueError: cannot reshape array of size 2048 into shape (1,1024,4)

So, Anyone know how to change config file and any idea to solve this problem

rusubbiz-muzkaq avatar Aug 09 '22 14:08 rusubbiz-muzkaq

Hi @rusubbiz-muzkaq,

Did you try to find a way to work with lengths of more than 512 tokens on layoutLMV3?I am also getting the same error

jyotiyadav94 avatar Oct 30 '22 15:10 jyotiyadav94

Hi, i have the same problem as @rusubbiz-muzkaq and @jyotiyadav94 and haven't figured it out yet. Any updates?

Edit: https://github.com/NielsRogge/Transformers-Tutorials/issues/203

freeZe2511 avatar Dec 21 '22 12:12 freeZe2511

Hi all,

I got it working for a longer sequence length. See https://github.com/microsoft/unilm/issues/942#issuecomment-1429033957.

Thank you :)

arvindrajan92 avatar Feb 14 '23 02:02 arvindrajan92

Hi all! I have explained my solution to handle large tokens here, hope it can help you:

https://github.com/huggingface/transformers/issues/19190#issuecomment-1441883471

alitavanaali avatar Feb 23 '23 14:02 alitavanaali