RetroMAE icon indicating copy to clipboard operation
RetroMAE copied to clipboard

Loss drop to 0 after 4 epochs

Open Iambestfeed opened this issue 10 months ago • 0 comments

I have logged the parameters of the training into wandb (you can see the link below). image

This is my config:

pretrain.run --output_dir output_merge_data \
--report_to wandb \
--data_dir data/merge_final_dataset.parquet \
--do_train True \
--save_steps 100000 \
--per_device_train_batch_size 12 \
--model_name_or_path iambestfeed/BanhmiBERT \
--pretrain_method retromae \
--fp16 True \
--warmup_ratio 0.1 \
--learning_rate 1e-4 \
--num_train_epochs 8 \
--overwrite_output_dir True \
--dataloader_num_workers 6 \
--weight_decay 0.01 \
--encoder_mlm_probability 0.3 \
--decoder_mlm_probability 0.5

And as you can observe, as soon as you finish running epoch 4, the loss drops to 0. What do you think is going on?

Iambestfeed avatar Apr 08 '24 07:04 Iambestfeed