`fairseq-hydra-train` is broken for legacy model architectures
🐛 Bug
The version of hydra-core which is prescribed in the fairseq/setup.py includes a bug from OmegaConf which breaks legacy model support in fairseq-hydra-train. The most up-to-date version of hydra-core allowed currently is 1.0.7, which has the extremely strange behavior that for a DictConfig object (which is what fairseq-hydra-train casts the args of the command into) overrides the behavior of __getattr__ so that getattr(dict_config_object, "key_not_present_in_dict_config_object", default_value) always returns None instead of default_value. This breaks the mechanism for modifying architectures through the register_model_architecture system.
Proposed fix
Allow new versions of hydra-core and omegaconf in setup.py. I upgrade to hydra-core==1.2.0 and omegaconf==2.2.3 and the bug was successfully resolved.
To Reproduce
For example, in the bart_large architecture, none of the hyperparameter changes are propagated, so that even if you specify arch=bart_large, the model actually created is just bart_base.
- Using this
config.yamlfile in your current working directory:
task:
_name: span_masked_lm
data: /path/to/data/binary
criterion: cross_entropy
dataset:
batch_size: 16
ignore_unused_valid_subsets: true
optimizer:
_name: adam
weight_decay: 0.01
adam_betas: (0.9,0.98)
adam_eps: 1e-06
lr_scheduler:
_name: inverse_sqrt
warmup_updates: 100000
optimization:
clip_norm: 0.1
lr: [1e-5]
max_update: 5000000
update_freq: [4]
model:
arch: bart_large
- execute
fairseq-hydra-train -m --config-dir ./ --config-name configwill successfully launch and train, but will train only a 6-layer bart.base model because none of the architecture changes were allowed to propagate through theDictConfigobject.
Expected behavior
The above example should train a 12-layer bart_large model as arch: bart_large is set.
Environment
- fairseq Version (e.g., 1.0 or main): d81fac8163364561fd6cd9d82b6ee1ba502c3526 (Aug 29 2022)
- PyTorch Version (e.g., 1.0): 1.12.1
- OS (e.g., Linux): Ubuntu 18.04
- How you installed fairseq (
pip, source): source - Build command you used (if compiling from source):
python -m pip install -e . --upgrade - Python version: 3.8
- CUDA/cuDNN version: 11.4
- GPU models and configuration: 4x 16GB Tesla V100
- Any other relevant information:
Had the same issue and updated OmegaConf to version 2.1.0 following the discussion at https://github.com/omry/omegaconf/discussions/746. They fixed this inconsistency in that version and I tested it and everything worked fine for BART pre-training. It's a minor update if they follow semantic versioning, and in theory it shouldn't be a problem to update fairseq and hydra.