involution
involution copied to clipboard
Nan
when I use I meet
please specify the experimental details
请具体说明实验的细节。
我是将他加在了yolo网络上,将panet层的conv改成了involution,使用conv的时候不会nan,但是改成involution时出现了nan
请具体说明实验的细节。
我是将他加在了yolo网络上,将panet层的conv改成了involution,使用conv的时候不会nan,但是改成involution时出现了nan
like this
请具体说明实验的细节。
我是将他加在了yolo网络上,将panet层的conv改成了involution,使用conv的时候不会nan,但是改成involution时出现了nan
大佬,有啥办法可以解决吗,我试过将loss调低但是没什么用
You may try the gradient clipping method, which is also used sometimes when we train our detection models, for example, https://github.com/d-li14/involution/blob/main/det/configs/involution/retinanet_red50_neck_fpn_1x_coco.py#L8
你可以试试梯度裁剪方法,有时在我们训练检测模型时也会用到,例如,Https://github.com/d-li14/involution/blob/main/det/configs/involution/retinanet_red50_neck_fpn_1x_coco.py#L8
thank you so much
你可以试试梯度裁剪方法,有时在我们训练检测模型时也会用到,例如,Https://github.com/d-li14/involution/blob/main/det/configs/involution/retinanet_red50_neck_fpn_1x_coco.py#L8
thank you so much
当我用了梯度裁剪好像还是会nan
I replaced the conv in the resblock in the super resolution model "edsr" with involution, and i used the gradient method, but the loss is still inf.
我用对合代替了超分辨率模型“edsr”中的conv,使用了梯度法,但损失仍然是inf。
你现在解决了吗
还没有
还没有
我也没有,可以讨论讨论
还没有
我也没有,可以讨论讨论
请问一下你们现在解决了吗?
The loss of mine in the training set is fine, while in cv set, some batches are nan. It's definitely not gradient explosion. I don't know how to find the problem and debug.
The loss of mine in the training set is fine, while in cv set, some batches are nan. It's definitely not gradient explosion. I don't know how to find the problem and debug.
Maybe your dataset is not pure?
I also met this problem in generation task. I replaced the con 3x3 by involution, the loss in nan or inf.
我在代任务中也遇到了这个问题。我将con3x3替换为对合,在NaN或INF中的损失。
我也没解决,所以我已经快要放弃使用involution了
我在代任务中也遇到了这个问题。我将con3x3替换为对合,在NaN或INF中的损失。
我也没解决,所以我已经快要放弃使用involution了 I also tried the gradient clipping method, but the NAN problem is not be solved, i will try to find some else methods which may work out.
我在代任务中也遇到了这个问题。我将con3x3替换为对合,在NaN或INF中的损失。
我也没解决,所以我已经快要放弃使用involution了 I also tried the gradient clipping method, but the NAN problem is not be solved, i will try to find some else methods which may work out.
I also tried the gradient clipping method too.But It didn't work.If you have any good methods, please share them, thank you
https://github.com/d-li14/involution/issues/26#issuecomment-819443734
I also met the same problem when dealing with the pose estimation task.
我在代任务中也遇到了这个问题。我将con3x3替换为对合,在NaN或INF中的损失。
我也没解决,所以我已经快要放弃使用involution了 I also tried the gradient clipping method, but the NAN problem is not be solved, i will try to find some else methods which may work out.
I also tried the gradient clipping method too.But It didn't work.If you have any good methods, please share them, thank you
ok
我在使用involution替换RCAN中的CA模块时,loss也非常大
I replace the standard conv with involution and added bn, then the loss seems normal.But the final result is worse than edsr baseline with bn layer,even though i added the parameters of the edsr-involution.Now i have given up. You can have a try and we can talk.@LJill
I replace the standard conv with involution and added bn, then the loss seems normal.But the final result is worse than edsr baseline with bn layer,even though i added the parameters of the edsr-involution.Now i have given up. You can have a try and we can talk.@LJill
Thanks for your reply . I tried your method on EDSR and RCAN , it works , the loss is normal now . I will conduct experiments to observe the final result .
I replace the standard conv with involution and added bn, then the loss seems normal.But the final result is worse than edsr baseline with bn layer,even though i added the parameters of the edsr-involution.Now i have given up. You can have a try and we can talk.@LJill
Thanks for your reply . I tried your method on EDSR and RCAN , it works , the loss is normal now . I will conduct experiments to observe the final result . when i replace the conv with involution and add BN, the train loss seems normal, but the val loss is NAN still, has this happened to your model?
我换成involution,结果参数好像不能进行优化。train loss一直下降,但是val loss一直保持一个值没变。有大佬知道这是不是过拟合造成的,还是代码错误。 我感觉不是过拟,因为train loss下降,val loss基本没变。还没有解决这个问题
The loss of mine in the training set is fine, while in cv set, some batches are nan. It's definitely not gradient explosion. I don't know how to find the problem and debug.
what cause this problem ?? I also met this issue. train loss is better, but the val loss is unchange.
I implemented a pure PyTorch 2D involution and faced a similar issue of Nans occurring during training when using the involution as a plug-in replacement for convolutions. In my case this was caused by exploding activation. For me, the issue could be solved by utilizing a higher momentum (0.3) in the batch normalization (after reduction) layer. I guess the distribution of the activation change that much that batch norm, with track_running_stats=True
and momentum=0.1
, can not follow the changing distribution, resulting in exploding activations. This was my conclusion after looking at the PyTorch batch norm implementation, which uses also the running stats for normalization during training (correct me if I'm wrong).
when I use I meet
@cymdhx @songwaimai @whf9527
我解决了我遇到的nan问题,附上我的解决方法,不知道是否适用于你们的: 问题描述: Unet + resnet 改为 unet + rednet50时出现 nan,inf 解决方案: 把程序中的 以下代码去掉,不要人为初始化 weight and bias
def set_bn_init(m):
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
m.weight.data.fill_(1.0)
m.bias.data.fill_(0.0)
I solved the nan problem I encountered, and attached my solution, I don’t know if it applies to yours: Problem description: When Unet + resnet is changed to unet + rednet50, nan and inf appear Solution: Remove the following code in the program, do not initialize weight and bias artificially
def set_bn_init(m):
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
m.weight.data.fill_(1.0)
m.bias.data.fill_(0.0)