`fairseq-hydra-train` is broken for legacy model architectures

Open colinclement opened this issue 3 years ago • 1 comments

🐛 Bug

The version of hydra-core which is prescribed in the fairseq/setup.py includes a bug from OmegaConf which breaks legacy model support in fairseq-hydra-train. The most up-to-date version of hydra-core allowed currently is 1.0.7, which has the extremely strange behavior that for a DictConfig object (which is what fairseq-hydra-train casts the args of the command into) overrides the behavior of __getattr__ so that getattr(dict_config_object, "key_not_present_in_dict_config_object", default_value) always returns None instead of default_value. This breaks the mechanism for modifying architectures through the register_model_architecture system.

Proposed fix

Allow new versions of hydra-core and omegaconf in setup.py. I upgrade to hydra-core==1.2.0 and omegaconf==2.2.3 and the bug was successfully resolved.

To Reproduce

For example, in the bart_large architecture, none of the hyperparameter changes are propagated, so that even if you specify arch=bart_large, the model actually created is just bart_base.

Using this config.yaml file in your current working directory:

task:
  _name: span_masked_lm
  data: /path/to/data/binary

criterion: cross_entropy

dataset:
  batch_size: 16
  ignore_unused_valid_subsets: true

optimizer:
  _name: adam
  weight_decay: 0.01
  adam_betas: (0.9,0.98)
  adam_eps: 1e-06

lr_scheduler:
  _name: inverse_sqrt
  warmup_updates: 100000

optimization:
  clip_norm: 0.1
  lr: [1e-5]
  max_update: 5000000
  update_freq: [4]

model:
  arch: bart_large

execute fairseq-hydra-train -m --config-dir ./ --config-name config will successfully launch and train, but will train only a 6-layer bart.base model because none of the architecture changes were allowed to propagate through the DictConfig object.

Expected behavior

The above example should train a 12-layer bart_large model as arch: bart_large is set.

Environment

fairseq Version (e.g., 1.0 or main): d81fac8163364561fd6cd9d82b6ee1ba502c3526 (Aug 29 2022)
PyTorch Version (e.g., 1.0): 1.12.1
OS (e.g., Linux): Ubuntu 18.04
How you installed fairseq (pip, source): source
Build command you used (if compiling from source): python -m pip install -e . --upgrade
Python version: 3.8
CUDA/cuDNN version: 11.4
GPU models and configuration: 4x 16GB Tesla V100
Any other relevant information:

Aug 30 '22 00:08 colinclement

Had the same issue and updated OmegaConf to version 2.1.0 following the discussion at https://github.com/omry/omegaconf/discussions/746. They fixed this inconsistency in that version and I tested it and everything worked fine for BART pre-training. It's a minor update if they follow semantic versioning, and in theory it shouldn't be a problem to update fairseq and hydra.

Nov 16 '22 11:11 erichans