fairseq Bug in Argument for the RoBERTa architecture

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Run with Main Branch Fairseq python fairseq/train.py $DATA_DIR \ --distributed-world-size 8 \ --tpu \ --log-format json \ --log-interval $LOG_INTERVAL \ --task masked_lm \ --criterion masked_lm \ --optimizer adam \ --num-workers 4 \ --adam-betas '(0.9,0.98)' \ --adam-eps 1e-6 \ --clip-norm 0.0 \ --arch roberta_base \ --sample-break-mode none \ --tokens-per-sample $TOKENS_PER_SAMPLE \ --lr-scheduler polynomial_decay \ --lr $PEAK_LR \ --save-dir checkpoints \ --warmup-updates $WARMUP_UPDATES \ --total-num-update $TOTAL_UPDATES \ --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \ --batch-size $MAX_SENTENCES --update-freq $UPDATE_FREQ \ --skip-invalid-size-inputs-valid-test \ --save-interval 3 \ --save-interval-updates $SAVE_INTERVAL \ --mask-whole-words
See error - Traceback (most recent call last): AttributeError: 'MaskedLMTask' object has no attribute 'args' Exception in device=TPU:1: 'MaskedLMTask' object has no attribute 'args' Traceback (most recent call last): File "/anaconda3/envs/torch-xla-1.9/lib/python3.7/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 329, in _mp_start_fn _start_fn(index, pf_cfg, fn, args) File "/anaconda3/envs/torch-xla-1.9/lib/python3.7/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 323, in _start_fn fn(gindex, *args) File "/home/diogo/fairseq/fairseq/distributed/utils.py", line 328, in distributed_main main(cfg, **kwargs) File "/home/diogo/fairseq/fairseq_cli/train.py", line 124, in main task.load_dataset(valid_sub_split, combine=False, epoch=1) File "/home/diogo/fairseq/fairseq/tasks/masked_lm.py", line 174, in load_dataset if self.cfg.mask_whole_words AttributeError: 'MaskedLMTask' object has no attribute 'args' Traceback (most recent call last): File "fairseq/train.py", line 14, in cli_main() File "/home/diogo/fairseq/fairseq_cli/train.py", line 507, in cli_main distributed_utils.call_main(cfg, main) File "/home/diogo/fairseq/fairseq/distributed/utils.py", line 365, in call_main nprocs=min(cfg.distributed_training.distributed_world_size, 8), File "/anaconda3/envs/torch-xla-1.9/lib/python3.7/site-packages/torch_xla/distributed/xla_multiprocessing.py", line 394, in spawn start_method=start_method) File "/anaconda3/envs/torch-xla-1.9/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/anaconda3/envs/torch-xla-1.9/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 144, in join exit_code=exitcode torch.multiprocessing.spawn.ProcessExitedException: process 7 terminated with exit code 17

Expected behavior

Environment

fairseq Version (main)
PyTorch Version (1.9)
OS (e.g., Linux):
How you installed fairseq (pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

Sep 30 '21 17:09 diiogofernands

I also encountered it, basically, pretraining RoBERTa with whole-word-masking will fail because it assumes the existence of args field, which was replaced with hydra's cfg.

A quick fix is to change in the masked_lm file line 174 from:

mask_whole_words = (
            get_whole_word_mask(self.args, self.source_dictionary)
            if self.cfg.mask_whole_words
            else None
        )

to:

mask_whole_words = (
            get_whole_word_mask(self.cfg.bpe, self.source_dictionary)
            if self.cfg.mask_whole_words
            else None
        )

and to pass in as argument the relevant BPE (in RoBERTa it should be "gpt2" if I recall correctly).

@arbabu123 @alexeib @ott

Jan 13 '22 06:01 yuvalkirstain

I also encountered it, basically, pretraining RoBERTa with whole-word-masking will fail because it assumes the existence of args field, which was replaced with hydra's cfg.

A quick fix is to change in the masked_lm file line 174 from:
mask_whole_words = (
            get_whole_word_mask(self.args, self.source_dictionary)
            if self.cfg.mask_whole_words
            else None
        )
to:
mask_whole_words = (
            get_whole_word_mask(self.cfg.bpe, self.source_dictionary)
            if self.cfg.mask_whole_words
            else None
        )
and to pass in as argument the relevant BPE (in RoBERTa it should be "gpt2" if I recall correctly).

@arbabu123 @alexeib @ott

In addition to modifying line 174 of mask, add the bpe field to MaskedLMConfig:

from typing import Optional
bpe: Optional[str] = field(
    default="",
    metadata={"help": "set --bpe when use mask whole words "},
)

Then use " --mask-whole-words --bpe gpt2 " on the command line

Oct 23 '22 16:10 jiaohuix

@yuvalkirstain @MiuGod0126 Thank you so much for this info!

Feb 06 '23 01:02 usuyama

fairseq fairseq copied to clipboard

Bug in Argument for the RoBERTa architecture

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

fairseq
fairseq copied to clipboard