fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

roberta-base wrong number of parameters

Open mumu12641 opened this issue 8 months ago • 1 comments

❓ Questions and Help

What is your question?

When I followed the pre-training instructions(examples/roberta/README.pretraining.md) and pre-trained roberta from scratch using roberta-base, I found that the number of parameters was 210M instead of 125M?

Code

my pretrain shell code: #! /usr/bin/bash

TOTAL_UPDATES=50000 # Total number of training steps WARMUP_UPDATES=3000 # Warmup the learning rate over this many updates
TOKENS_PER_SAMPLE=512 # Max sequence length MAX_POSITIONS=512 # Num. positional embeddings (usually same as above) MAX_SENTENCES=64 PEAK_LR=0.0005 # Peak learning rate, adjust as needed CLIP_NORM=0 PORT=$(( $RANDOM + 2000 )) prefix=new_test GPUS=$1 ARCH=$2 DATA_DIR=data-bin/wikitext-103 UPDATE_FREQ=$(( 512 / $MAX_SENTENCES / $GPUS ))

nohup fairseq-train $DATA_DIR
--seed 42
--user-dir .
--task masked_lm --criterion masked_lm
--distributed-world-size $1
--save-dir checkpoints/$prefix/$ARCH
--arch $ARCH --sample-break-mode complete --tokens-per-sample $TOKENS_PER_SAMPLE
--optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-6 --clip-norm $CLIP_NORM
--lr-scheduler polynomial_decay --lr $PEAK_LR --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01
--batch-size $MAX_SENTENCES --update-freq $UPDATE_FREQ
--ddp-backend=legacy_ddp
--no-epoch-checkpoints
--find-unused-parameters
--max-update $TOTAL_UPDATES --log-format simple --log-interval 50 > $ARCH.log &

What have you tried?

What's your environment?

  • fairseq Version (e.g., 1.0 or main):0.12.2(from pip)
  • PyTorch Version (e.g., 1.0): 2.8.0+cu128
  • OS (e.g., Linux):Linux
  • How you installed fairseq (pip, source):pip
  • Build command you used (if compiling from source):
  • Python version: 3.10.16
  • CUDA/cuDNN version:12.8
  • GPU models and configuration:H20
  • Any other relevant information: My log is as follows, showing 210M instead of 125M, perhaps because each bias is True (because when I studied the custom model, I set all biases to false and the parameters I got were roughly 125M) Image

mumu12641 avatar Mar 25 '25 08:03 mumu12641

Finally, I had no choice but to hack into the source code of fairseq, set all biases to False, and obtained 125M reberta-base

mumu12641 avatar Mar 28 '25 02:03 mumu12641