soxan Error during Training on private dataset

Error during Training on private dataset

Open eaedk opened this issue 3 years ago • 0 comments

Morning, I used your notebook Speech Emotion Recognition (Wav2Vec 2.0) with another dataset and I got an error during the training... Could you help me please, the code and error are just below .

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir=finetune_output_dir,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    evaluation_strategy="steps",  #"epoch"
    gradient_accumulation_steps=1,
    num_train_epochs=50,
    fp16=True,
    save_steps= 10, #n_steps,
    eval_steps= 10, #n_steps,
    logging_steps=10,
    learning_rate=1e-4,
    save_total_limit=10,
)

trainer = CTCTrainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=processor.feature_extractor,
)

trainer.train()

The following columns in the training set  don't have a corresponding argument in `Wav2Vec2ForSpeechClassification.forward` and have been ignored: language, audio_name, path.
***** Running training *****
  Num examples = 10769
  Num Epochs = 50
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 134650
/anaconda/envs/azureml_py38_pytorch/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448278899/work/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)

[ 142/134650 1:17:58 < 1248:33:33, 0.03 it/s, Epoch 0.05/50]

Step	Training Loss	Validation Loss	Accuracy
10	0.698400	0.497485	0.813416
20	0.394700	0.291701	0.913778
30	0.225200	0.138921	0.951371
40	0.389500	0.137598	0.962752
50	0.373600	0.469463	0.878255
60	0.079500	0.144742	0.972237
70	0.213000	0.185833	0.969822
80	0.046400	0.295700	0.947405
90	0.003300	0.149647	0.979134
100	0.000800	0.124717	0.978617
110	0.313800	0.237750	0.958441
120	0.251000	0.166465	0.965166
130	0.032900	0.044269	0.989826
140	0.051600	0.061006	0.989826

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-3435b262f1ae> in <module>
----> 1 trainer.train()

/anaconda/envs/azureml_py38_pytorch/lib/python3.8/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1330                         tr_loss_step = self.training_step(model, inputs)
   1331                 else:
-> 1332                     tr_loss_step = self.training_step(model, inputs)
   1333 
   1334                 if (

<ipython-input-29-878b4353167f> in training_step(self, model, inputs)
     43         if self.use_amp:
     44             with autocast():
---> 45                 loss = self.compute_loss(model, inputs)
     46         else:
     47             loss = self.compute_loss(model, inputs)

/anaconda/envs/azureml_py38_pytorch/lib/python3.8/site-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
   1921         else:
   1922             labels = None
-> 1923         outputs = model(**inputs)
   1924         # Save past state if it exists
   1925         # TODO: this needs to be fixed and made cleaner later.

/anaconda/envs/azureml_py38_pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

<ipython-input-16-dd9fe3ea0f13> in forward(self, input_values, attention_mask, output_attentions, output_hidden_states, return_dict, labels)
     70     ):
     71         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
---> 72         outputs = self.wav2vec2(
     73             input_values,
     74             attention_mask=attention_mask,

/anaconda/envs/azureml_py38_pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/anaconda/envs/azureml_py38_pytorch/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in forward(self, input_values, attention_mask, mask_time_indices, output_attentions, output_hidden_states, return_dict)
   1285 
   1286         hidden_states, extract_features = self.feature_projection(extract_features)
-> 1287         hidden_states = self._mask_hidden_states(
   1288             hidden_states, mask_time_indices=mask_time_indices, attention_mask=attention_mask
   1289         )

/anaconda/envs/azureml_py38_pytorch/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in _mask_hidden_states(self, hidden_states, mask_time_indices, attention_mask)
   1228             hidden_states[mask_time_indices] = self.masked_spec_embed.to(hidden_states.dtype)
   1229         elif self.config.mask_time_prob > 0 and self.training:
-> 1230             mask_time_indices = _compute_mask_indices(
   1231                 (batch_size, sequence_length),
   1232                 mask_prob=self.config.mask_time_prob,

/anaconda/envs/azureml_py38_pytorch/lib/python3.8/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py in _compute_mask_indices(shape, mask_prob, mask_length, attention_mask, min_masks)
    240 
    241         # get random indices to mask
--> 242         spec_aug_mask_idx = np.random.choice(
    243             np.arange(input_length - (mask_length - 1)), num_masked_span, replace=False
    244         )

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: Cannot take a larger sample than population when 'replace=False'

Jan 20 '22 10:01 eaedk

soxan soxan copied to clipboard

Error during Training on private dataset

soxan
soxan copied to clipboard