fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Add sequence length padding option for wav2vec2 causes TPU "Scalar type not supported" error during audio pretraining.

Open catalinnega opened this issue 2 years ago • 1 comments

🐛 Bug

Add sequence length padding option for wav2vec2 causes TPU "Scalar type not supported" error during audio pretraining. The issue is related to the following commit: https://github.com/pytorch/fairseq/commit/c2b771b1beed56d03134aa0a807fbbec766e484c . Commenting out the commit changes within the TransformerEncoder object will avoid this error. The pretraining configuration file used is the one provided at https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu.yaml .

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run cmd fairseq-hydra-train --config-dir $HYDRA_CONFIG_DIR --config-name $HYDRA_CONFIG_NAME common.tensorboard_logdir=$TENSORBOARD_LOGDIR checkpoint.finetune_from_model=$TRAIN_PRETRAINED_MODEL task.data=$AUDIO_FILES_MANIFEST_DIR hydra.verbose=true

  2. See error

    loss, sample_size_i, logging_output = self.task.train_step(
  File "./fairseq/fairseq/tasks/fairseq_task.py", line 512, in train_step
    loss, sample_size, logging_output = criterion(model, sample)
  File "./venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/./fairseq/fairseq/criterions/wav2vec_criterion.py", line 53, in forward
    net_output = model(**sample["net_input"])
  File "./venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "./fairseq/fairseq/models/wav2vec/wav2vec2.py", line 675, in forward
    x, layer_results = self.encoder(x, padding_mask=padding_mask, layer=layer)
  File "./venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "./fairseq/fairseq/models/wav2vec/wav2vec2.py", line 1003, in forward
    x, layer_results = self.extract_features(x, padding_mask, layer)
  File ./fairseq/fairseq/models/wav2vec/wav2vec2.py", line 1036, in extract_features
    padding_mask, _ = pad_to_multiple(
  File "./fairseq/fairseq/models/wav2vec/utils.py", line 21, in pad_to_multiple
    return F.pad(x, (*pad_offset, 0, remainder), value=value), remainder
  File "./venv/lib/python3.8/site-packages/torch/nn/functional.py", line 4364, in _pad
    return _VF.constant_pad_nd(input, pad, value)
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:100 : Check failed: scalar_value.isIntegral()

Code sample

[] Commenting out the commit changes from https://github.com/pytorch/fairseq/commit/c2b771b1beed56d03134aa0a807fbbec766e484c within the TransformerEncoder object will avoid this error.

Expected behavior

Padding scalar type should be compliant with TPU variable requirements or configuration could be added to disable padding when using TPU.

Environment

  • fairseq Version "main";
  • PyTorch Version "1.11";
  • OS "Linux";
  • How you installed fairseq "pip install -e .";
  • Python version: "python3.8"
  • TPU models and configuration: "TPU v3-8"

Additional context

The pretraining configuration file used is the one provided at https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu.yaml

catalinnega avatar Mar 25 '22 11:03 catalinnega