CogQA 将bert换成albert时，加载输入数据时出了个错误

作者老师您好！我在改进代码模型的时候尝试将bert换成albert 我把 BERT_MODEL = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=True) 换成了 tokenizer = BertTokenizer.from_pretrained("./albert_base") BERT_MODEL = BertModel.from_pretrained("./albert_base")

然后报错： File "train.py", line 158, in main bundles.append(convert_question_to_samples_bundle(tokenizer, data)) File "/home/shao/CogQA/data.py", line 187, in convert_question_to_samples_bundle ids.append(tokenizer.convert_tokens_to_ids(tokenized_all)) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids ids.append(self.vocab[token]) KeyError: '[CLS]' 请问会是加载数据时什么方面的原因呢？期待您的回复！

Jan 09 '21 11:01 ShaoaAllen

应该是Albert的tokenizer词表和bert不同引起的发自我的华为手机-------- 原始邮件 --------发件人： ShaoaAllen [email protected]日期： 2021年1月9日周六晚上7:15收件人： THUDM/CogQA [email protected]抄送： Subscribed [email protected]主题： [THUDM/CogQA] 将bert换成albert时，加载输入数据时出了个错误 (#35)

作者老师您好！我在改进代码模型的时候尝试将bert换成albert

我把

BERT_MODEL = 'bert-base-uncased'

tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=True)

换成了

tokenizer = BertTokenizer.from_pretrained("./albert_base")

BERT_MODEL = BertModel.from_pretrained("./albert_base")

然后报错：

File "train.py", line 158, in main

bundles.append(convert_question_to_samples_bundle(tokenizer, data))

File "/home/shao/CogQA/data.py", line 187, in convert_question_to_samples_bundle

ids.append(tokenizer.convert_tokens_to_ids(tokenized_all))

File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids

ids.append(self.vocab[token])

KeyError: '[CLS]'

请问会是加载数据时什么方面的原因呢？期待您的回复！

—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or unsubscribe.

Jan 09 '21 11:01 Sleepychord

我用这两句代码将albert引入： tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2', do_lower_case=True) BERT_MODEL = AlbertModel.from_pretrained('albert-base-v2')

然后报了这个错误： File "train.py", line 179, in main cache_dir=PYTORCH_PRETRAINED_BERT_CACHE / 'distributed_{}'.format(-1)) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 566, in from_pretrained resolved_archive_file = cached_path(archive_file, cache_dir=cache_dir) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/file_utils.py", line 102, in cached_path parsed = urlparse(url_or_filename) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 367, in urlparse url, scheme, _coerce_result = _coerce_args(url, scheme) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 123, in _coerce_args return _decode_args(args) + (_encode_result,) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 107, in _decode_args return tuple(x.decode(encoding, errors) if x else '' for x in args) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 107, in return tuple(x.decode(encoding, errors) if x else '' for x in args) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 779, in getattr type(self).name, name)) torch.nn.modules.module.ModuleAttributeError: 'AlbertModel' object has no attribute 'decode' 这个decode指的是解码层decode还是别的decode呀？照理来说albert应该跟bert一样有编码层解码层，为什么会报这个错呢？

Jan 11 '21 01:01 ShaoaAllen

CogQA CogQA copied to clipboard

将bert换成albert时，加载输入数据时出了个错误

CogQA
CogQA copied to clipboard