PaddleFleetX icon indicating copy to clipboard operation
PaddleFleetX copied to clipboard

EMA失效

Open shippingwang opened this issue 4 years ago • 5 comments

在基于Fleet开发的代码中使用EMA会失效 代码见: https://github.com/PaddlePaddle/PaddleClas/blob/master/tools/program.py#L364 https://github.com/PaddlePaddle/PaddleClas/blob/master/tools/train.py#L119

image

EMA参考 https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/optimizer_cn/ExponentialMovingAverage_cn.html#exponentialmovingaverage

怀疑是distributed_optimizer的问题....

shippingwang avatar Jul 11 '20 14:07 shippingwang

看起来是的,默认的选项会把program直接转为compiled program,并保存在fleet.main_program中,所以minimize之后的操作不没有更新到计算图中。可以尝试如下选项试试

    dist_strategy.mode = "collective"
    dist_strategy.collective_mode = "grad_allreduce"

@shippingwang

guru4elephant avatar Jul 12 '20 00:07 guru4elephant

看起来是的,默认的选项会把program直接转为compiled program,并保存在fleet.main_program中,所以minimize之后的操作不没有更新到计算图中。可以尝试如下选项试试

    dist_strategy.mode = "collective"
    dist_strategy.collective_mode = "grad_allreduce"

@shippingwang

好的👍 可以更新了

36b2b2816fab04b72bea3dbca4d7e3b3

精度上我再对下

shippingwang avatar Jul 13 '20 02:07 shippingwang

好的,可以把结论发到这里@shippingwang

guru4elephant avatar Jul 13 '20 04:07 guru4elephant

@shippingwang 上述解答是否解决了你的问题?

gavin1332 avatar Jul 17 '20 06:07 gavin1332

@shippingwang 上述解答是否解决了你的问题?

精度还在验证,可能要下周末才有结论,之后我再及时更新这个issue

shippingwang avatar Jul 17 '20 08:07 shippingwang