fairseq
fairseq copied to clipboard
roberta-base wrong number of parameters
❓ Questions and Help
What is your question?
When I followed the pre-training instructions(examples/roberta/README.pretraining.md) and pre-trained roberta from scratch using roberta-base, I found that the number of parameters was 210M instead of 125M?
Code
my pretrain shell code: #! /usr/bin/bash
TOTAL_UPDATES=50000 # Total number of training steps
WARMUP_UPDATES=3000 # Warmup the learning rate over this many updates
TOKENS_PER_SAMPLE=512 # Max sequence length
MAX_POSITIONS=512 # Num. positional embeddings (usually same as above)
MAX_SENTENCES=64
PEAK_LR=0.0005 # Peak learning rate, adjust as needed
CLIP_NORM=0
PORT=$(( $RANDOM + 2000 ))
prefix=new_test
GPUS=$1
ARCH=$2
DATA_DIR=data-bin/wikitext-103
UPDATE_FREQ=$(( 512 / $MAX_SENTENCES / $GPUS ))
nohup fairseq-train $DATA_DIR
--seed 42
--user-dir .
--task masked_lm --criterion masked_lm
--distributed-world-size $1
--save-dir checkpoints/$prefix/$ARCH
--arch $ARCH --sample-break-mode complete --tokens-per-sample $TOKENS_PER_SAMPLE
--optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-6 --clip-norm $CLIP_NORM
--lr-scheduler polynomial_decay --lr $PEAK_LR --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES
--dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01
--batch-size $MAX_SENTENCES --update-freq $UPDATE_FREQ
--ddp-backend=legacy_ddp
--no-epoch-checkpoints
--find-unused-parameters
--max-update $TOTAL_UPDATES --log-format simple --log-interval 50 > $ARCH.log &
What have you tried?
What's your environment?
- fairseq Version (e.g., 1.0 or main):0.12.2(from pip)
- PyTorch Version (e.g., 1.0): 2.8.0+cu128
- OS (e.g., Linux):Linux
- How you installed fairseq (
pip, source):pip - Build command you used (if compiling from source):
- Python version: 3.10.16
- CUDA/cuDNN version:12.8
- GPU models and configuration:H20
- Any other relevant information:
My log is as follows, showing 210M instead of 125M, perhaps because each bias is True (because when I studied the custom model, I set all biases to false and the parameters I got were roughly 125M)
Finally, I had no choice but to hack into the source code of fairseq, set all biases to False, and obtained 125M reberta-base