fairseq
fairseq copied to clipboard
Add sequence length padding option for wav2vec2 causes TPU "Scalar type not supported" error during audio pretraining.
🐛 Bug
Add sequence length padding option for wav2vec2 causes TPU "Scalar type not supported" error during audio pretraining. The issue is related to the following commit: https://github.com/pytorch/fairseq/commit/c2b771b1beed56d03134aa0a807fbbec766e484c . Commenting out the commit changes within the TransformerEncoder object will avoid this error. The pretraining configuration file used is the one provided at https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu.yaml .
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
-
Run cmd
fairseq-hydra-train --config-dir $HYDRA_CONFIG_DIR --config-name $HYDRA_CONFIG_NAME common.tensorboard_logdir=$TENSORBOARD_LOGDIR checkpoint.finetune_from_model=$TRAIN_PRETRAINED_MODEL task.data=$AUDIO_FILES_MANIFEST_DIR hydra.verbose=true
-
See error
loss, sample_size_i, logging_output = self.task.train_step(
File "./fairseq/fairseq/tasks/fairseq_task.py", line 512, in train_step
loss, sample_size, logging_output = criterion(model, sample)
File "./venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/./fairseq/fairseq/criterions/wav2vec_criterion.py", line 53, in forward
net_output = model(**sample["net_input"])
File "./venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "./fairseq/fairseq/models/wav2vec/wav2vec2.py", line 675, in forward
x, layer_results = self.encoder(x, padding_mask=padding_mask, layer=layer)
File "./venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "./fairseq/fairseq/models/wav2vec/wav2vec2.py", line 1003, in forward
x, layer_results = self.extract_features(x, padding_mask, layer)
File ./fairseq/fairseq/models/wav2vec/wav2vec2.py", line 1036, in extract_features
padding_mask, _ = pad_to_multiple(
File "./fairseq/fairseq/models/wav2vec/utils.py", line 21, in pad_to_multiple
return F.pad(x, (*pad_offset, 0, remainder), value=value), remainder
File "./venv/lib/python3.8/site-packages/torch/nn/functional.py", line 4364, in _pad
return _VF.constant_pad_nd(input, pad, value)
RuntimeError: /pytorch/xla/torch_xla/csrc/helpers.h:100 : Check failed: scalar_value.isIntegral()
Code sample
[] Commenting out the commit changes from https://github.com/pytorch/fairseq/commit/c2b771b1beed56d03134aa0a807fbbec766e484c within the TransformerEncoder object will avoid this error.
Expected behavior
Padding scalar type should be compliant with TPU variable requirements or configuration could be added to disable padding when using TPU.
Environment
- fairseq Version "main";
- PyTorch Version "1.11";
- OS "Linux";
- How you installed fairseq "pip install -e .";
- Python version: "python3.8"
- TPU models and configuration: "TPU v3-8"
Additional context
The pretraining configuration file used is the one provided at https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/config/pretraining/wav2vec2_large_librivox_tpu.yaml