language model hyperparameters

Open TaniyaHaghighi20 opened this issue 2 years ago • 0 comments

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

hi, hope you're well I have a dataset of journal abstracts separated by empty lines. I want to use each one of them as a training sample (I read that I should use --sample-break-mode complete_doc) and I want to run that on 1 GPU for 200k steps, with 1024 tokens and 64 accumulated steps. also, I want to use Adam as the optimizer with a peak learning rate of 0.0002, and 20000 warm-up steps. The learning rate follows an inverse square root decay schedule after reaching the peak.

I'm not sure if I chose the right values for each hyperparameter in the command, could you please modify it? also, I read somewhere that batchsize=--max-tokens/--tokens-per-sample but I'm not sure how to set the batch size when I'm using --sample-break-mode complete_doc. I think when we use --sample-break-mode complete_doc the tokens-per-sample does not matter, right?

Code

What have you tried?

!fairseq-train --task language_modeling data-bin
--save-dir checkpoints/transformer \ --arch transformer_lm_gpt2_medium --share-decoder-input-output-embed
--dropout 0.1 --optimizer adam --adam-betas '(0.9, 0.98)' --weight-decay 0.01 --clip-norm 0.0
--lr 0.0002 \ --lr-scheduler inverse_sqrt --warmup-updates 20000 --warmup-init-lr 1e-07
--sample-break-mode complete_doc --tokens-per-sample 512\ --max-tokens 2048 --update-freq 64 --fp16 --bpe fastbpe \ --max-update 50000 --max-epoch 15

What's your environment?

fairseq Version (e.g., 1.0 or main):
PyTorch Version (e.g., 1.0)
OS (e.g., Linux):
How you installed fairseq (pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Jun 09 '23 17:06 TaniyaHaghighi20