Transformers-Tutorials
Transformers-Tutorials copied to clipboard
Issue while training Donut model for parsing with custom decoder and tokenizer
Hey all, I was trying to train donut model for parsing, which contains Arabic(only) information, in order to achieve this i had collected Arabic corpus
from various sources and then trained,
-
Mbart Tokenizer
for arabic corpus. -
Mbart decoder
with the same dataset.
Initially the model was training well meaning the loss was decreasing gradually but, during Validation, all my dataset tokens are predicting as <UNK>
tokens. Because of this the Normed ED
value is above 0.9
but still the loss is decreasing.
Is there anything I am missing out , any inputs will help a lot. @gwkrsrch , @Vadkoz ,@NielsRogge Thanks regards.