unilm
unilm copied to clipboard
[TrOCR] TrOCR processor issue with small version
When I try to use the small version from trocr cannot convert to a fast tokenizer as the code below
Processor :
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-small-handwritten")
`model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-small-handwritten")
` the issue as :
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-small-handwritten")
File "/home/.local/lib/python3.8/site-packages/transformers/processing_utils.py", line 186, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/home/.local/lib/python3.8/site-packages/transformers/processing_utils.py", line 230, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/home/.local/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 591, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1805, in from_pretrained
return cls._from_pretrained(
File "/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1950, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/.local/lib/python3.8/site-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 155, in __init__
super().__init__(
File "/home/.local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 113, in __init__
fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
File "/home/.local/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 1111, in convert_slow_tokenizer
return converter_class(transformer_tokenizer).converted()
File "/home/.local/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 426, in __init__
from .utils import sentencepiece_model_pb2 as model_pb2
File "/home/.local/lib/python3.8/site-packages/transformers/utils/sentencepiece_model_pb2.py", line 34, in <module>
create_key=_descriptor._internal_create_key,
AttributeError: module 'google.protobuf.descriptor' has no attribute '_internal_create_key
sentencepiece
was installed
it is the same case as introcr-large-stage1
when I replace the processor with base one it works but gives such bad results on the I am dataset where CER= 57.3
@Mohammed20201991
You may need to upgrade the protobuf version.
Ref: https://stackoverflow.com/questions/61922334/how-to-solve-attributeerror-module-google-protobuf-descriptor-has-no-attribu