thai2transformers
thai2transformers copied to clipboard
Missing model_max_length in roberta config
When loaded with transformers.AutoTokenizer.from_pretrained, the model_max_len was set to 1000000000000000019884624838656.
This results in IndexError: index out of range in self when using with flair in the code below.
from flair.embeddings import TransformerDocumentEmbeddings
wangchanberta = TransformerDocumentEmbeddings('airesearch/wangchanberta-base-att-spm-uncased')
wangchanberta .embed(sentence)
After searching, I found this issue https://github.com/huggingface/transformers/issues/14315#issuecomment-964363283 and it stated that model_max_length is missing from the configuration file.
My current workaround is manually calling the following code to overrides the missing config.
wangchanberta.tokenizer.model_max_length = 510