hyperparameters on iwslt dataset
I use follow hyperparameters run on iwslt14, but it seems performs bad. result show only bleu 26 on iwslt14, Does anyone know the appropriate hyperparameters for the iwslt dataset? thanks a lot!
`export CUDA_VISIBLE_DEVICES=1 dataset=data-bin/distill_iwslt14.tokenized.de-en save_dir=model/iwslt14-dslp log=${save_dir}/train.log max_update=250000 lr=0.0002 layers=5 dim=256
python3 train.py ${dataset} --source-lang de --target-lang en --save-dir ${save_dir} --eval-tokenized-bleu
--task translation_lev --criterion nat_loss --arch nat_sd
--maximize-best-checkpoint-metric
--eval-bleu-remove-bpe --best-checkpoint-metric bleu --log-format simple --log-interval 100
--eval-bleu --eval-bleu-detok space --keep-last-epochs 3 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d
--share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr ${lr}
--lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 4000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01
--fp16 --clip-norm 2.0 --max-update ${max_update} --noise full_mask
--concat-yhat --concat-dropout 0.0 --label-smoothing 0.1
--activation-fn gelu --dropout 0.3 --max-tokens 8192
--length-loss-factor 0.1 --pred-length-offset
--encoder-layers ${layers} --encoder-embed-dim ${dim} --decoder-layers ${layers} --decoder-embed-dim ${dim} --encoder-ffn-embed-dim 1024 | tee -a ${log}`