BERT-keyphrase-extraction icon indicating copy to clipboard operation
BERT-keyphrase-extraction copied to clipboard

'NoneType' object has no attribute 'convert_tokens_to_ids'

Open ShivanshuPurohit opened this issue 4 years ago • 3 comments

While running train.py I encountered this error: Model name 'model/' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'model/vocab.txt' was a path or url but couldn't find any file associated to this path or url.

Traceback (most recent call last): File "train.py", line 168, in <module> train_data = data_loader.load_data('train') File "/content/BERT-keyphrase-extraction/data_loader.py", line 83, in load_data self.load_sentences_tags(sentences_file, tags_path, data) File "/content/BERT-keyphrase-extraction/data_loader.py", line 51, in load_sentences_tags sentences.append(self.tokenizer.convert_tokens_to_ids(tokens)) AttributeError: 'NoneType' object has no attribute 'convert_tokens_to_ids'

I think it isn't registering the pytorch_model.bin file, which I directly downloaded as bert-base-uncased.tar.gz

Also, when I modify the command to go in task1/train, python train.py --data_dir data/task1/train/ --bert_model_dir model/ --model_dir experiments/base_model the error is: Loading the datasets... Traceback (most recent call last): File "train.py", line 165, in <module> data_loader = DataLoader(args.data_dir, args.bert_model_dir, params, token_pad_idx=0) File "/content/BERT-keyphrase-extraction/data_loader.py", line 28, in __init__ self.tag_pad_idx = self.tag2idx['O'] KeyError: 'O'

ShivanshuPurohit avatar Oct 12 '20 15:10 ShivanshuPurohit

In BertTokenizer's, convert_tokens_to_ids function gives KeyError. So, I suggest to modify the for loop in the function as follows.

for token in tokens: ids.append(self.vocab.get(token, self.vocab['[UNK]']))

sahiljethani avatar Aug 02 '21 11:08 sahiljethani

tokens = self.tokenizer.tokenize(line) used this instead of split()

arunmack789 avatar Dec 26 '21 12:12 arunmack789

While running train.py I encountered this error: Model name 'model/' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'model/vocab.txt' was a path or url but couldn't find any file associated to this path or url.

Traceback (most recent call last): File "train.py", line 168, in <module> train_data = data_loader.load_data('train') File "/content/BERT-keyphrase-extraction/data_loader.py", line 83, in load_data self.load_sentences_tags(sentences_file, tags_path, data) File "/content/BERT-keyphrase-extraction/data_loader.py", line 51, in load_sentences_tags sentences.append(self.tokenizer.convert_tokens_to_ids(tokens)) AttributeError: 'NoneType' object has no attribute 'convert_tokens_to_ids'

I think it isn't registering the pytorch_model.bin file, which I directly downloaded as bert-base-uncased.tar.gz

Also, when I modify the command to go in task1/train, python train.py --data_dir data/task1/train/ --bert_model_dir model/ --model_dir experiments/base_model the error is: Loading the datasets... Traceback (most recent call last): File "train.py", line 165, in <module> data_loader = DataLoader(args.data_dir, args.bert_model_dir, params, token_pad_idx=0) File "/content/BERT-keyphrase-extraction/data_loader.py", line 28, in __init__ self.tag_pad_idx = self.tag2idx['O'] KeyError: 'O'

hey....how did you complete this step From scibert repo, untar the weights (rename their weight dump file to pytorch_model.bin) and vocab file into a new folder model. can you please help with this

hnrNeha avatar Jun 14 '22 03:06 hnrNeha