BERT-NER icon indicating copy to clipboard operation
BERT-NER copied to clipboard

Model training does not work on CPU

Open saurabhhssaurabh opened this issue 3 years ago • 1 comments

I have cloned code from dev branch and executing following command to fine-tune model on CPU: python run_ner.py --cache_dir=path_to_cache --data_dir=path_to_data --bert_model=bert-base-uncased --task_name=ner --output_dir=path_to_output --no_cuda --do_train --do_eval --warmup_proportion=0.1

But I am facing the following error: Traceback (most recent call last): File "run_ner.py", line 611, in main() File "run_ner.py", line 503, in main loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "run_ner.py", line 43, in forward logits = self.classifier(sequence_output) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward return F.linear(input, self.weight, self.bias) File "/home/dev01/python_3/lib64/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear output = input.matmul(weight.t()) RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

I am not getting when I am passing CPU flag, why is it expecting a tensor to be on GPU?

saurabhhssaurabh avatar Jan 27 '21 09:01 saurabhhssaurabh

--no_cuda has an error with the NER task, because the device can still be set to GPU here:

class Ner(BertForTokenClassification):

    def forward(self, input_ids,
                token_type_ids=None,
                attention_mask=None,
                labels=None,
                valid_ids=None,
                attention_mask_label=None):
        # ... skipping to line 47
        valid_output = torch.zeros(batch_size,
                                   max_len,
                                   feat_dim,
                                   dtype=torch.float32,
                                   device='gpu')

I changed the default device arg to cpu when I wasn't using CUDA and everything worked as expected.

brandonrobertz avatar Mar 24 '22 20:03 brandonrobertz