OpenPrompt
OpenPrompt copied to clipboard
Can't load tokenizer for 'xlm-roberta-base'.
This is what I get when trying to load xlm-roberta-base
from openprompt.plms import load_plm
plm, tokenizer, model_config, WrapperClass = load_plm("roberta", "xlm-roberta-base")
OSError Traceback (most recent call last)
[<ipython-input-3-bc593607bff3>](https://localhost:8080/#) in <module>
1 from openprompt.plms import load_plm
----> 2 plm, tokenizer, model_config, WrapperClass = load_plm("roberta", "xlm-roberta-base")
1 frames
[/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1758 if all(full_file_name is None for full_file_name in resolved_vocab_files.values()):
1759 raise EnvironmentError(
-> 1760 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
1761 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "
1762 f"Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a directory "
OSError: Can't load tokenizer for 'xlm-roberta-base'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'xlm-roberta-base' is the correct path to a directory containing all relevant files for a RobertaTokenizer tokenizer.

Help is much appreciated, Thanks
look into https://github.com/thunlp/OpenPrompt/blob/main/openprompt/plms/init.py#L87. The 'xlm-roberta-base' is not the same as "roberta", it uses XLMRobertaConfig other than RobertaConfig, XLMRobertaTokenizer instead of RobertaTokenizer.
It would be possible to modify here to add "xlm-roberta-base" into _MODEL_CLASSES. Or you can copy those codes in load_plm out into your juypter notebook, and modify those model_class.config, model_class.tokenizer, etc. into xlm-roberta related one.
@Achazwl thank you! any future plans on extending the framework for XLMR as well?
Dear All I have a question about modifying init.py please guide me. I want to use the SciBERT model from Huggingface I try to add the model and tokenizer to init.py in colab. I don't know what is the config or wrapper. after that, I close the init.py and run again but the Seibert is not recognized. How I can test other models in Huggingface?
Dear All I have a question about modifying init.py please guide me. I want to use the SciBERT model from Huggingface I try to add the model and tokenizer to init.py in colab. I don't know what is the config or wrapper. after that, I close the init.py and run again but the Seibert is not recognized. How I can test other models in Huggingface?
after you modified the code, you should reload code in your python working space. eg: `from imp import reload
openprompt = reload(openprompt)
load_plm = openprompt.plms.load_plm`
and you should modify the code, and import it using `import sys
sys.path.insert(0, '/location_path/OpenPrompt')`
Thank you for your reply
I change the code in colab like the bellow:
Adding a model will result in an error. I probably didn't do the right in reloading the module Your guidance in this regard will be very valuable
On Wed, 29 Mar 2023, 4:41 pm kinghmy, @.***> wrote:
Dear All I have a question about modifying init.py please guide me. I want to use the SciBERT model from Huggingface I try to add the model and tokenizer to init.py in colab. I don't know what is the config or wrapper. after that, I close the init.py and run again but the Seibert is not recognized. How I can test other models in Huggingface?
after you modified the code, you should reload code in your python working space. eg: from imp import reload openprompt = reload(openprompt) load_plm = openprompt.plms.load_plm
and you should modify the code, and import it using import sys sys.path.insert(0, '/location_path/OpenPrompt')
— Reply to this email directly, view it on GitHub https://github.com/thunlp/OpenPrompt/issues/199#issuecomment-1488486380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4V5FJ4I6HISD5EBOFLC6DW6QRHXANCNFSM6AAAAAARE4SUEE . You are receiving this because you commented.Message ID: @.***>
Thank you for your reply
I change the code in colab like the bellow:
1- add this model to init.py
'PubMedBERT': ModelClass(**{ 'config': BertConfig, 'tokenizer': AutoTokenizer.from_pretrained('microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext'), 'model':AutoModel.from_pretrained('microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext'), 'wrapper': MLMTokenizerWrapper, }),
2- reload the model
`import sys
import importlib
sys.path.insert(0, '/content/OpenPrompt') importlib.reload(sys)`
3- run the cell:
plm, tokenizer, model_config, WrapperClass = load_plm("PubMedBERT",'microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext')
4- I get this error:
`KeyError Traceback (most recent call last)
1 frames /content/OpenPrompt/OpenPrompt/openprompt/plms/init.py in get_model_class(plm_type) 89 "tokenizer": GPT2Tokenizer, 90 "model": GPTJForCausalLM, ---> 91 "wrapper": LMTokenizerWrapper 92 }), 93 }
KeyError: 'PubMedBERT'`
Adding a model will result in an error. I probably didn't do the right in reloading the module Your guidance in this regard will be very valuable
Thank you for your reply
I change the code in colab like the bellow in the attached figure. Adding a model will result in an error. I probably didn't do the right in reloading the module Your guidance in this regard will be very valuable
On Wed, 29 Mar 2023, 6:02 pm Hoda Memarzadeh, @.***> wrote:
Thank you for your reply
On Wed, 29 Mar 2023, 4:41 pm kinghmy, @.***> wrote:
Dear All I have a question about modifying init.py please guide me. I want to use the SciBERT model from Huggingface I try to add the model and tokenizer to init.py in colab. I don't know what is the config or wrapper. after that, I close the init.py and run again but the Seibert is not recognized. How I can test other models in Huggingface?
after you modified the code, you should reload code in your python working space. eg: from imp import reload openprompt = reload(openprompt) load_plm = openprompt.plms.load_plm
and you should modify the code, and import it using import sys sys.path.insert(0, '/location_path/OpenPrompt')
— Reply to this email directly, view it on GitHub https://github.com/thunlp/OpenPrompt/issues/199#issuecomment-1488486380, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4V5FJ4I6HISD5EBOFLC6DW6QRHXANCNFSM6AAAAAARE4SUEE . You are receiving this because you commented.Message ID: @.***>
Hi,
So if you want a potential fix that goes around the "load_plm" function from OpenPrompt, you can load each component in separately and then merge:
Actually I have one thing you can try - it will avoid using OpenPrompts "load_plm" function. For instance, the SciBERT model should still work with OpenPrompts MLM tokenizer wrapper, so you can load the components in separately and them piece together.
Imports
from openprompt.plms.seq2seq import T5TokenizerWrapper, T5LMTokenizerWrapper
from openprompt.plms.lm import LMTokenizerWrapper
from openprompt.plms.mlm import MLMTokenizerWrapper
from transformers import T5Config, T5Tokenizer, T5ForConditionalGeneration
from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM, AutoModelForMaskedLM, AutoTokenizer,
Load components separately
model_name = "your_mode_name_here"
plm = AutoModelForMaskedLM.from_pretrained(model_name)
WrapperClass = MLMTokenizerWrapper
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = False)
Then you pass these to the prompt dataloader as you normally would. I do not have time right now to test this for the models outlined in this issue, but this has worked for me when using custom models. But SciBERT under the hood should potentially work directly with the OpenPrompt MLMTokenizerWrapper.
你好,来信我已收到,我会尽快处理,谢谢!
Hi
Thank you very much for your time and explanation.
On Tue, May 9, 2023 at 1:58 PM kinghmy @.***> wrote:
你好,来信我已收到,我会尽快处理,谢谢!
— Reply to this email directly, view it on GitHub https://github.com/thunlp/OpenPrompt/issues/199#issuecomment-1539896269, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP4V5FOCR46JC7Z37KX76XDXFIL4JANCNFSM6AAAAAARE4SUEE . You are receiving this because you commented.Message ID: @.***>