RetroMAE
RetroMAE copied to clipboard
Loss drop to 0 after 4 epochs
I have logged the parameters of the training into wandb (you can see the link below).
This is my config:
pretrain.run --output_dir output_merge_data \
--report_to wandb \
--data_dir data/merge_final_dataset.parquet \
--do_train True \
--save_steps 100000 \
--per_device_train_batch_size 12 \
--model_name_or_path iambestfeed/BanhmiBERT \
--pretrain_method retromae \
--fp16 True \
--warmup_ratio 0.1 \
--learning_rate 1e-4 \
--num_train_epochs 8 \
--overwrite_output_dir True \
--dataloader_num_workers 6 \
--weight_decay 0.01 \
--encoder_mlm_probability 0.3 \
--decoder_mlm_probability 0.5
And as you can observe, as soon as you finish running epoch 4, the loss drops to 0. What do you think is going on?