CogQA icon indicating copy to clipboard operation
CogQA copied to clipboard

将bert换成albert时,加载输入数据时出了个错误

Open ShaoaAllen opened this issue 4 years ago • 2 comments

作者老师您好!我在改进代码模型的时候尝试将bert换成albert 我把 BERT_MODEL = 'bert-base-uncased' tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=True) 换成了 tokenizer = BertTokenizer.from_pretrained("./albert_base") BERT_MODEL = BertModel.from_pretrained("./albert_base")

然后报错: File "train.py", line 158, in main bundles.append(convert_question_to_samples_bundle(tokenizer, data)) File "/home/shao/CogQA/data.py", line 187, in convert_question_to_samples_bundle ids.append(tokenizer.convert_tokens_to_ids(tokenized_all)) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids ids.append(self.vocab[token]) KeyError: '[CLS]' 请问会是加载数据时什么方面的原因呢?期待您的回复!

ShaoaAllen avatar Jan 09 '21 11:01 ShaoaAllen

应该是Albert的tokenizer词表和bert不同引起的发自我的华为手机-------- 原始邮件 --------发件人: ShaoaAllen [email protected]日期: 2021年1月9日周六 晚上7:15收件人: THUDM/CogQA [email protected]抄送: Subscribed [email protected]主 题: [THUDM/CogQA] 将bert换成albert时,加载输入数据时出了个错误 (#35)

作者老师您好!我在改进代码模型的时候尝试将bert换成albert

我把

BERT_MODEL = 'bert-base-uncased'

tokenizer = BertTokenizer.from_pretrained(BERT_MODEL, do_lower_case=True)

换成了

tokenizer = BertTokenizer.from_pretrained("./albert_base")

BERT_MODEL = BertModel.from_pretrained("./albert_base")

然后报错:

File "train.py", line 158, in main

bundles.append(convert_question_to_samples_bundle(tokenizer, data))

File "/home/shao/CogQA/data.py", line 187, in convert_question_to_samples_bundle

ids.append(tokenizer.convert_tokens_to_ids(tokenized_all))

File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization.py", line 121, in convert_tokens_to_ids

ids.append(self.vocab[token])

KeyError: '[CLS]'

请问会是加载数据时什么方面的原因呢?期待您的回复!

—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or unsubscribe.

Sleepychord avatar Jan 09 '21 11:01 Sleepychord

我用这两句代码将albert引入: tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2', do_lower_case=True) BERT_MODEL = AlbertModel.from_pretrained('albert-base-v2')

然后报了这个错误: File "train.py", line 179, in main cache_dir=PYTORCH_PRETRAINED_BERT_CACHE / 'distributed_{}'.format(-1)) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 566, in from_pretrained resolved_archive_file = cached_path(archive_file, cache_dir=cache_dir) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/pytorch_pretrained_bert/file_utils.py", line 102, in cached_path parsed = urlparse(url_or_filename) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 367, in urlparse url, scheme, _coerce_result = _coerce_args(url, scheme) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 123, in _coerce_args return _decode_args(args) + (_encode_result,) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 107, in _decode_args return tuple(x.decode(encoding, errors) if x else '' for x in args) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/urllib/parse.py", line 107, in return tuple(x.decode(encoding, errors) if x else '' for x in args) File "/home/shao/anaconda3/envs/cogqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 779, in getattr type(self).name, name)) torch.nn.modules.module.ModuleAttributeError: 'AlbertModel' object has no attribute 'decode' 这个decode指的是解码层decode还是别的decode呀?照理来说albert应该跟bert一样有编码层解码层,为什么会报这个错呢?

ShaoaAllen avatar Jan 11 '21 01:01 ShaoaAllen