Transformers-Tutorials
Transformers-Tutorials copied to clipboard
Donut model size beyond 768*2 max_length
Is there any way to go beyond the max_length of 768x2? I tried training the model using 768x4 as the max_length with sufficient gpu power, but its giving internal cuda error (not related to memory usage).
Is there any way to achieve greater max_length? or its just model limitation?
I am also looking for the answer to this.
Any conclusions?
I think you would need to interpolate the position embeddings of the pre-trained text decoder for the model to go beyond 768 tokens.
As seen here: https://github.com/clovaai/donut/blob/4cfcf972560e1a0f26eb3e294c8fc88a0d336626/donut/model.py#L188-L195