UNITER
UNITER copied to clipboard
How to judge the convergence of the pre-training model?
How to measure the loss weight of different pre-training tasks? Which task's loss determines the model training convergence?