fairseq [Wav2Vec2] Cannot load newly added Wav2Vec2 checkpoints

🐛 Bug

A recent commit: https://github.com/pytorch/fairseq/commit/2513524a1604dbafcc4ea9cc5a99ae0aa4f19694 added two new fine-tuned Wav2Vec2 checkpoints, however it seems like there is a problem with the saved config as one cannot load those checkpoints. E.g. the following code cannot be run:

import fairseq
model, _, _ = fairseq.checkpoint_utils.load_model_ensemble_and_task([checkpoint_path], arg_overrides={"data": "path/to/dict"})

To Reproduce

The following colab reproduces the error (one just has to run all cells): https://colab.research.google.com/drive/13hJI4w8pOD33hxOJ_qwKkN9QqdKVH5IM?usp=sharing

Kindly pinging @alexeib here :-)

Aug 18 '21 11:08 patrickvonplaten

@patrickvonplaten I think this fixed on master branch I changed the install line to

!pip install git+https://github.com/pytorch/fairseq.git@master

Aug 18 '21 23:08 abodacs

Still got the same problem ;-)

See https://colab.research.google.com/drive/13hJI4w8pOD33hxOJ_qwKkN9QqdKVH5IM?usp=sharing

Aug 19 '21 09:08 patrickvonplaten

@patrickvonplaten a notebook with a working example also, I removed

arg_overrides

https://colab.research.google.com/drive/1gPQ1LzAoEbQtRYRGVPz4zg9Klo-ErwjH?usp=sharing

Aug 20 '21 16:08 abodacs

@patrickvonplaten Hi, I met the same problem. Do you have any solution? Thank you. I run the code:

model, _, _ = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path])

I got the error:

ConfigKeyError: Key 'target_dict' not in 'AudioPretrainingConfig' full_key: target_dict reference_type=Optional[AudioPretrainingConfig] object_type=AudioPretrainingConfig

Sep 05 '21 19:09 ag027592

Bump. got similar error.

Both target_dict and eval_wer does not make sense to be in AudioPretrainingConfig, as pretraining does not use text. Has someone found the solution?

omegaconf.errors.ConfigKeyError: Key 'eval_wer' not in 'AudioPretrainingConfig'
        full_key: eval_wer
        reference_type=Optional[AudioPretrainingConfig]
        object_type=AudioPretrainingConfig

Feb 18 '22 02:02 effendijohanes

We have the robust models otherwise also on the HF Hub: https://huggingface.co/models?arxiv=arxiv:2104.01027 if you're interested

Feb 18 '22 12:02 patrickvonplaten

I am still getting this error

ConfigKeyError: Key 'eval_wer' not in 'AudioPretrainingConfig'
	full_key: eval_wer
	reference_type=Optional[AudioPretrainingConfig]
	object_type=AudioPretrainingConfig

while running the code

import torch
import fairseq
cp_path = '../w2v_large_lv_fsh_swbd_cv_ftsb300_updated.pt'
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path])
model = model[0]
model.eval()

Package                Version         Location
---------------------- --------------- --------------------------------------------
antlr4-python3-runtime 4.8
backcall               0.2.0
bitarray               2.3.7
certifi                2021.10.8
cffi                   1.15.0
colorama               0.4.4
Cython                 0.29.28
debugpy                1.5.1
decorator              5.1.1
entrypoints            0.3
fairseq                1.0.0a0+5175fd5 
hydra-core             1.0.7
ipykernel              6.4.1
ipython                7.31.1
ipython-genutils       0.2.0
jedi                   0.18.1
jupyter-client         7.1.2
jupyter-core           4.9.1
matplotlib-inline      0.1.2
nest-asyncio           1.5.1
numpy                  1.22.2
omegaconf              2.0.6
parso                  0.8.3
pexpect                4.8.0
pickleshare            0.7.5
pip                    21.2.4
portalocker            2.4.0
prompt-toolkit         3.0.20
protobuf               3.19.4
ptyprocess             0.7.0
pycparser              2.21
Pygments               2.11.2
python-dateutil        2.8.2
PyYAML                 6.0
pyzmq                  22.3.0
regex                  2022.1.18
sacrebleu              2.0.0
setuptools             58.0.4
six                    1.16.0
tabulate               0.8.9
tensorboardX           2.5
torch                  1.10.2
torchaudio             0.10.2
tornado                6.1
tqdm                   4.62.3
traitlets              5.1.1
typing_extensions      4.1.1
wcwidth                0.2.5
wheel                  0.37.1

Feb 24 '22 11:02 amiyamandal-dev

I am still getting this error

ConfigKeyError: Key 'eval_wer' not in 'AudioPretrainingConfig'
	full_key: eval_wer
	reference_type=Optional[AudioPretrainingConfig]
	object_type=AudioPretrainingConfig

while running the code

import torch
import fairseq
cp_path = '../w2v_large_lv_fsh_swbd_cv_ftsb300_updated.pt'
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path])
model = model[0]
model.eval()

Package                Version         Location
---------------------- --------------- --------------------------------------------
antlr4-python3-runtime 4.8
backcall               0.2.0
bitarray               2.3.7
certifi                2021.10.8
cffi                   1.15.0
colorama               0.4.4
Cython                 0.29.28
debugpy                1.5.1
decorator              5.1.1
entrypoints            0.3
fairseq                1.0.0a0+5175fd5 
hydra-core             1.0.7
ipykernel              6.4.1
ipython                7.31.1
ipython-genutils       0.2.0
jedi                   0.18.1
jupyter-client         7.1.2
jupyter-core           4.9.1
matplotlib-inline      0.1.2
nest-asyncio           1.5.1
numpy                  1.22.2
omegaconf              2.0.6
parso                  0.8.3
pexpect                4.8.0
pickleshare            0.7.5
pip                    21.2.4
portalocker            2.4.0
prompt-toolkit         3.0.20
protobuf               3.19.4
ptyprocess             0.7.0
pycparser              2.21
Pygments               2.11.2
python-dateutil        2.8.2
PyYAML                 6.0
pyzmq                  22.3.0
regex                  2022.1.18
sacrebleu              2.0.0
setuptools             58.0.4
six                    1.16.0
tabulate               0.8.9
tensorboardX           2.5
torch                  1.10.2
torchaudio             0.10.2
tornado                6.1
tqdm                   4.62.3
traitlets              5.1.1
typing_extensions      4.1.1
wcwidth                0.2.5
wheel                  0.37.1

same problem, Have you solved it?

Mar 01 '22 08:03 Kristopher-Chen

You can solve this by cloning the repo, and then just copying all those missing parameters from the audio fine-tuning config into the audio pretraining config

Mar 01 '22 10:03 patrickvonplaten

You can solve this by cloning the repo, and then just copying all those missing parameters from the audio fine-tuning config into the audio pretraining config

did not get this...Shouldn't the config be included in the checkpoints?

Mar 01 '22 11:03 Kristopher-Chen

eval_wer

@patrickvonplaten Hi, I think I got it. I copied the parameters \fairseq\fairseq\tasks\audio_finetuning.py (AudioFinetuningConfig) to audio_pretraining.py (AudioPretrainingConfig). Is that what you meant?

One more question, it seems the frame size is 20ms, is that the case? I found nowhere the config is.

Thank you a lot!

Mar 01 '22 12:03 Kristopher-Chen

@patrickvonplaten , I have updated AudioPretrainingConfig with class attributes AudioFinetuningTask which are not inside AudioPretrainingConfig which are causing error. But now new attribute is missing which is target_dict and showing the error

ConfigKeyError: Key 'target_dict' not in 'AudioPretrainingConfig'
	full_key: target_dict
	reference_type=Optional[AudioPretrainingConfig]
	object_type=AudioPretrainingConfig

I have searched it and its been used in examples/wav2vec/unsupervised/models/wav2vec_u.py and target_dict has been passed in inti but where I can pass or create class attribute in AudioPretrainingConfig

Mar 22 '22 12:03 amiyamandal-dev

+1 for eval_wer / AudioPretrainingConfig issue using wav2vec2.

Dec 13 '22 05:12 keunwoochoi

@keunwoochoi eval_wer is not available in pretraining task, override the task into finetuning, like this:

model_override_rules = ast.literal_eval(args.model_overrides)
model_override_rules['task'] = {'_name': 'audio_finetuning'}
models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task(
    arg_overrides=model_override_rules,
    .....
)

Dec 13 '22 07:12 effendijohanes

you can load the checkpoint now but it appears the checkpoint has the wrong task (probably created before audio_finetuning was split off from audio_pretraining), and the remedy there is to update the task just as @effendijohanes pointed out. i will try to update the checkpoints at some point in the future

Dec 13 '22 07:12 alexeib

fairseq fairseq copied to clipboard

[Wav2Vec2] Cannot load newly added Wav2Vec2 checkpoints

🐛 Bug

To Reproduce

fairseq
fairseq copied to clipboard