albert_pytorch global_step的位置导致多次执行无谓的evaluate()

global_step的位置导致多次执行无谓的evaluate()

Open illusions-LYY opened this issue 5 years ago • 0 comments

https://github.com/lonePatient/albert_pytorch/blob/e9dbe3ce9aa49e787774b050cbdc496046e0c5bf/run_classifier.py#L110-L122

以上是run_classifier.py line110-122的代码。假如args.gradient_accumulation_steps取默认值1，则不会有任何问题；然而当设置args.gradient_accumulation_steps为其他值时，以4为例，外循环的前3步（即step=0~3）就无法通过line110的if判断，从而导致global_step一直为0，然后导致line116的if判断基本总能通过（缘由global_step=0时，global_step % args.logging_steps == 0恒成立），最终导致还没开始梯度更新，就做了3次无谓的evaluate。所以这里可能存在一些瑕疵，我理解的是，这里的变量global_step应与line78logger.info(" Total optimization steps = %d", num_training_steps)中的num_training_steps保持一致，每进行一次梯度更新，代表实际上一个batch的数据被计算了一遍，global_step才+1，这也是train()函数最终返回的loss=tr_loss / global_step的原因。所以我想是否可以直接在line116、line121的判断上加一个限制global_step != 0，我想这样大概就可以暂时解决该问题了。

May 06 '20 12:05 illusions-LYY

albert_pytorch albert_pytorch copied to clipboard

global_step的位置导致多次执行无谓的evaluate()

albert_pytorch
albert_pytorch copied to clipboard