transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Official guide for classification task is False, it reports an error

Open Oxi84 opened this issue 10 months ago • 5 comments

System Info

Latest version ubuntu 22.04 - https://huggingface.co/docs/transformers/en/tasks/sequence_classification

Who can help?

No response

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

I used all the same commands from the guide here - https://huggingface.co/docs/transformers/en/tasks/sequence_classification

Expected behavior

I expected it would be trained properly but it does not work i get this error when i run train:

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:436: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
  warnings.warn(

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 32
      9 training_args = TrainingArguments(
     10     output_dir="my_awesome_model",
     11     learning_rate=2e-5,
   (...)
     19     push_to_hub=False,
     20 )
     22 trainer = Trainer(
     23     model=model,
     24     args=training_args,
   (...)
     29     compute_metrics=compute_metrics,
     30 )
---> 32 trainer.train()
     34 trainer.save_model("/workspace/mymodel/")

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1780, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1778         hf_hub_utils.enable_progress_bars()
   1779 else:
-> 1780     return inner_training_loop(
   1781         args=args,
   1782         resume_from_checkpoint=resume_from_checkpoint,
   1783         trial=trial,
   1784         ignore_keys_for_eval=ignore_keys_for_eval,
   1785     )

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2118, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   2115     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   2117 with self.accelerator.accumulate(model):
-> 2118     tr_loss_step = self.training_step(model, inputs)
   2120 if (
   2121     args.logging_nan_inf_filter
   2122     and not is_torch_xla_available()
   2123     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   2124 ):
   2125     # if loss is nan or inf simply add the average of previous logged losses
   2126     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:3036, in Trainer.training_step(self, model, inputs)
   3033     return loss_mb.reduce_mean().detach().to(self.args.device)
   3035 with self.compute_loss_context_manager():
-> 3036     loss = self.compute_loss(model, inputs)
   3038 if self.args.n_gpu > 1:
   3039     loss = loss.mean()  # mean() to average on multi-gpu parallel training

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:3059, in Trainer.compute_loss(self, model, inputs, return_outputs)
   3057 else:
   3058     labels = None
-> 3059 outputs = model(**inputs)
   3060 # Save past state if it exists
   3061 # TODO: this needs to be fixed and made cleaner later.
   3062 if self.args.past_index >= 0:

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /usr/local/lib/python3.10/dist-packages/transformers/models/distilbert/modeling_distilbert.py:1002, in DistilBertForSequenceClassification.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
    994 r"""
    995 labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
    996     Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
    997     config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
    998     `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    999 """
   1000 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1002 distilbert_output = self.distilbert(
   1003     input_ids=input_ids,
   1004     attention_mask=attention_mask,
   1005     head_mask=head_mask,
   1006     inputs_embeds=inputs_embeds,
   1007     output_attentions=output_attentions,
   1008     output_hidden_states=output_hidden_states,
   1009     return_dict=return_dict,
   1010 )
   1011 hidden_state = distilbert_output[0]  # (bs, seq_len, dim)
   1012 pooled_output = hidden_state[:, 0]  # (bs, dim)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /usr/local/lib/python3.10/dist-packages/transformers/models/distilbert/modeling_distilbert.py:814, in DistilBertModel.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    811 # Prepare head mask if needed
    812 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
--> 814 embeddings = self.embeddings(input_ids, inputs_embeds)  # (bs, seq_length, dim)
    816 if self._use_flash_attention_2:
    817     attention_mask = attention_mask if (attention_mask is not None and 0 in attention_mask) else None

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /usr/local/lib/python3.10/dist-packages/transformers/models/distilbert/modeling_distilbert.py:156, in Embeddings.forward(self, input_ids, input_embeds)
    152     position_ids = position_ids.unsqueeze(0).expand_as(input_ids)  # (bs, max_seq_length)
    154 position_embeddings = self.position_embeddings(position_ids)  # (bs, max_seq_length, dim)
--> 156 embeddings = input_embeds + position_embeddings  # (bs, max_seq_length, dim)
    157 embeddings = self.LayerNorm(embeddings)  # (bs, max_seq_length, dim)
    158 embeddings = self.dropout(embeddings)  # (bs, max_seq_length, dim)

RuntimeError: The size of tensor a (1232) must match the size of tensor b (512) at non-singleton dimension 1

Oxi84 avatar Apr 07 '24 14:04 Oxi84

It worked when I used the notebook ( https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/pytorch/sequence_classification.ipynb) tha goes along with the text - seems like it is updated or simply I made some mistakes when copy pasting to my notebook.

Oxi84 avatar Apr 07 '24 15:04 Oxi84

Solved - explained in the comment above.

Oxi84 avatar Apr 07 '24 15:04 Oxi84

Reopening, I can reproduce. It's something that is wrong on main. cc @ArthurZucker @younesbelkada

muellerzr avatar Apr 10 '24 16:04 muellerzr

@stevhliu do you want to have a look?

ArthurZucker avatar May 23 '24 07:05 ArthurZucker

Hmm, I'm unable to reproduce the error from the linked notebook and the current guide on main (see attached notebook)

stevhliu avatar May 23 '24 17:05 stevhliu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 17 '24 08:06 github-actions[bot]