CogQA
CogQA copied to clipboard
将bert换成albert时,加载输入数据时出了个错误
作者老师您好!我在改进代码模型的时候尝试将bert换成albert 我把 BERT_MODEL = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=True) 换成了 tokenizer = BertTokenizer.from_pretrained("./albert_base") BERT_MODEL = BertModel.from_pretrained("./albert_base")
然后报错: File "train.py", line 158, in main bundles.append(convert_question_to_samples_bundle(tokenizer, data)) File "/home/shao/CogQA/data.py", line 187, in convert_question_to_samples_bundle ids.append(tokenizer.convert_tokens_to_ids(tokenized_all)) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids ids.append(self.vocab[token]) KeyError: '[CLS]' 请问会是加载数据时什么方面的原因呢?期待您的回复!
应该是Albert的tokenizer词表和bert不同引起的发自我的华为手机-------- 原始邮件 --------发件人: ShaoaAllen [email protected]日期: 2021年1月9日周六 晚上7:15收件人: THUDM/CogQA [email protected]抄送: Subscribed [email protected]主 题: [THUDM/CogQA] 将bert换成albert时,加载输入数据时出了个错误 (#35)
作者老师您好!我在改进代码模型的时候尝试将bert换成albert
我把
BERT_MODEL = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=True)
换成了
tokenizer = BertTokenizer.from_pretrained("./albert_base")
BERT_MODEL = BertModel.from_pretrained("./albert_base")
然后报错:
File "train.py", line 158, in main
bundles.append(convert_question_to_samples_bundle(tokenizer, data))
File "/home/shao/CogQA/data.py", line 187, in convert_question_to_samples_bundle
ids.append(tokenizer.convert_tokens_to_ids(tokenized_all))
File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids
ids.append(self.vocab[token])
KeyError: '[CLS]'
请问会是加载数据时什么方面的原因呢?期待您的回复!
—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or unsubscribe.
我用这两句代码将albert引入: tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2', do_lower_case=True) BERT_MODEL = AlbertModel.from_pretrained('albert-base-v2')
然后报了这个错误:
File "train.py", line 179, in main
cache_dir=PYTORCH_PRETRAINED_BERT_CACHE / 'distributed_{}'.format(-1))
File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 566, in from_pretrained
resolved_archive_file = cached_path(archive_file, cache_dir=cache_dir)
File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/file_utils.py", line 102, in cached_path
parsed = urlparse(url_or_filename)
File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 367, in urlparse
url, scheme, _coerce_result = _coerce_args(url, scheme)
File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 123, in _coerce_args
return _decode_args(args) + (_encode_result,)
File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 107, in _decode_args
return tuple(x.decode(encoding, errors) if x else '' for x in args)
File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 107, in