transformers
transformers copied to clipboard
Official guide for classification task is False, it reports an error
System Info
Latest version ubuntu 22.04 - https://huggingface.co/docs/transformers/en/tasks/sequence_classification
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I used all the same commands from the guide here - https://huggingface.co/docs/transformers/en/tasks/sequence_classification
Expected behavior
I expected it would be trained properly but it does not work i get this error when i run train:
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:436: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
warnings.warn(
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[12], line 32
9 training_args = TrainingArguments(
10 output_dir="my_awesome_model",
11 learning_rate=2e-5,
(...)
19 push_to_hub=False,
20 )
22 trainer = Trainer(
23 model=model,
24 args=training_args,
(...)
29 compute_metrics=compute_metrics,
30 )
---> 32 trainer.train()
34 trainer.save_model("/workspace/mymodel/")
File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1780, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1778 hf_hub_utils.enable_progress_bars()
1779 else:
-> 1780 return inner_training_loop(
1781 args=args,
1782 resume_from_checkpoint=resume_from_checkpoint,
1783 trial=trial,
1784 ignore_keys_for_eval=ignore_keys_for_eval,
1785 )
File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2118, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
2115 self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
2117 with self.accelerator.accumulate(model):
-> 2118 tr_loss_step = self.training_step(model, inputs)
2120 if (
2121 args.logging_nan_inf_filter
2122 and not is_torch_xla_available()
2123 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
2124 ):
2125 # if loss is nan or inf simply add the average of previous logged losses
2126 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:3036, in Trainer.training_step(self, model, inputs)
3033 return loss_mb.reduce_mean().detach().to(self.args.device)
3035 with self.compute_loss_context_manager():
-> 3036 loss = self.compute_loss(model, inputs)
3038 if self.args.n_gpu > 1:
3039 loss = loss.mean() # mean() to average on multi-gpu parallel training
File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:3059, in Trainer.compute_loss(self, model, inputs, return_outputs)
3057 else:
3058 labels = None
-> 3059 outputs = model(**inputs)
3060 # Save past state if it exists
3061 # TODO: this needs to be fixed and made cleaner later.
3062 if self.args.past_index >= 0:
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File /usr/local/lib/python3.10/dist-packages/transformers/models/distilbert/modeling_distilbert.py:1002, in DistilBertForSequenceClassification.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
994 r"""
995 labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
996 Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
997 config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
998 `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
999 """
1000 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1002 distilbert_output = self.distilbert(
1003 input_ids=input_ids,
1004 attention_mask=attention_mask,
1005 head_mask=head_mask,
1006 inputs_embeds=inputs_embeds,
1007 output_attentions=output_attentions,
1008 output_hidden_states=output_hidden_states,
1009 return_dict=return_dict,
1010 )
1011 hidden_state = distilbert_output[0] # (bs, seq_len, dim)
1012 pooled_output = hidden_state[:, 0] # (bs, dim)
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File /usr/local/lib/python3.10/dist-packages/transformers/models/distilbert/modeling_distilbert.py:814, in DistilBertModel.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
811 # Prepare head mask if needed
812 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
--> 814 embeddings = self.embeddings(input_ids, inputs_embeds) # (bs, seq_length, dim)
816 if self._use_flash_attention_2:
817 attention_mask = attention_mask if (attention_mask is not None and 0 in attention_mask) else None
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File /usr/local/lib/python3.10/dist-packages/transformers/models/distilbert/modeling_distilbert.py:156, in Embeddings.forward(self, input_ids, input_embeds)
152 position_ids = position_ids.unsqueeze(0).expand_as(input_ids) # (bs, max_seq_length)
154 position_embeddings = self.position_embeddings(position_ids) # (bs, max_seq_length, dim)
--> 156 embeddings = input_embeds + position_embeddings # (bs, max_seq_length, dim)
157 embeddings = self.LayerNorm(embeddings) # (bs, max_seq_length, dim)
158 embeddings = self.dropout(embeddings) # (bs, max_seq_length, dim)
RuntimeError: The size of tensor a (1232) must match the size of tensor b (512) at non-singleton dimension 1
It worked when I used the notebook ( https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/pytorch/sequence_classification.ipynb) tha goes along with the text - seems like it is updated or simply I made some mistakes when copy pasting to my notebook.
Solved - explained in the comment above.
Reopening, I can reproduce. It's something that is wrong on main
. cc @ArthurZucker @younesbelkada
@stevhliu do you want to have a look?
Hmm, I'm unable to reproduce the error from the linked notebook and the current guide on main
(see attached notebook)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.