PIDNet icon indicating copy to clipboard operation
PIDNet copied to clipboard

Error if target consists of only background class

Open Harut0726 opened this issue 2 years ago • 15 comments

Hi, thank you for sharing your work. I tried to train the model on my custom dataset derived from CamVid and I'm getting error during calculating the loss function when target consists of only background class (255). The error appears on 73th line in utils/criterion.py.

pred = pred.gather(1, tmp_target.unsqueeze(1))
pred, ind = pred.contiguous().view(-1,)[mask].contiguous().sort()
min_value = pred[min(self.min_kept, pred.numel() - 1)] # --- here is the error because pred is an empty array and mask only consist of False values
threshold = max(min_value, self.thresh)

Harut0726 avatar Jul 18 '22 16:07 Harut0726

I got the same error. Any updates on this?

Daniel595 avatar Aug 03 '22 14:08 Daniel595

I also noticed this error. My current workaround:

       if pred.numel() > 0:
            min_value = pred[min(self.min_kept, pred.numel() - 1)]
            threshold = max(min_value, self.thresh)
        else:
            threshold = self.thresh

For me, this is an infrequent error, so the workaround does not seem to affect training negatively. However, if you have many images with void/ignore pixels, it may not be the right solution.

tomihisaw avatar Aug 03 '22 14:08 tomihisaw

I also encountered this problem, is there a solution? (The number of backgrounds is still a lot of datasets)

m828 avatar Aug 10 '22 03:08 m828

Did you try @tomihisaw's workaround? It works fine for me. Nearly 90% of my custom-dataset is background only and my training results are sufficient. The output during training didn't look very good to me (might be caused generally by inbalance) but the final prediction and visualization on a seperate test are ok.

Daniel595 avatar Aug 10 '22 07:08 Daniel595

Did you try @tomihisaw's workaround? It works fine for me. Nearly 90% of my custom-dataset is background only and my training results are sufficient. The output during training didn't look very good to me (might be caused generally by inbalance) but the final prediction and visualization on a seperate test are ok.

Oh? I can run it, but it still doesn't work after a few epochs. I will not continue, then I will try again, thank you. image

m828 avatar Aug 10 '22 07:08 m828

Did you try @tomihisaw's workaround? It works fine for me. Nearly 90% of my custom-dataset is background only and my training results are sufficient. The output during training didn't look very good to me (might be caused generally by inbalance) but the final prediction and visualization on a seperate test are ok.

I tried to run the model, but the mIoU is still 0.1250, how is your data file written?

m828 avatar Aug 11 '22 01:08 m828

I'm using my own dataset class with pandas dataframe. So I think my data file would not help you here. On your output it looks to me like all other classes 1-7 were missing. Did you doublecheck the segmentation masks having the right pixel value? Loss nan happened to me if the learning rate was to high (like x10, x100). I also disabled multiscale since I didn't find any parameters that worked well. Maybe there is sth. helpfull.

Daniel595 avatar Aug 11 '22 07:08 Daniel595

我也注意到这个错误。我目前的解决方法:

       if pred.numel() > 0:
            min_value = pred[min(self.min_kept, pred.numel() - 1)]
            threshold = max(min_value, self.thresh)
        else:
            threshold = self.thresh

对我来说,这是一个不常见的错误,因此解决方法似乎不会对训练产生负面影响。但是,如果您有许多带有无效/忽略像素的图像,则可能不是正确的解决方案。

你好,我改完之后损失变成了nan,大佬方便发下具体文件,QQ:2358919383,邮箱:[email protected]

scl666 avatar Oct 12 '22 04:10 scl666

Jul

你好,请问这个错误找到解决方法了吗?我也遇到这个问题了,想请教一下

scl666 avatar Oct 12 '22 04:10 scl666

Hi, thank you for sharing your work. I tried to train the model on my custom dataset derived from CamVid and I'm getting error during calculating the loss function when target consists of only background class (255). The error appears on 73th line in utils/criterion.py.

pred = pred.gather(1, tmp_target.unsqueeze(1))
pred, ind = pred.contiguous().view(-1,)[mask].contiguous().sort()
min_value = pred[min(self.min_kept, pred.numel() - 1)] # --- here is the error because pred is an empty array and mask only consist of False values
threshold = max(min_value, self.thresh)

Jul

你好,请问这个错误找到解决方法了吗?我也遇到这个问题了,想请教一下

这个问题在难例挖掘ohem loss里几乎都会遇到。 此处你们出现此问题主要出现在这一行代码中:https://github.com/XuJiacong/PIDNet/blob/f0ac91cdea7bf0cb2077b65e960c5b98b9173b0f/utils/utils.py#L53 code:bd_label = torch.where(torch.sigmoid(outputs[-1][:,0,:,:])>0.8, labels, filler) 注意,此处的0.8为作者设置的难例阈值,ohem官方给的阈值为0.7。当bd_label中元素值全为1,即表示outputs的第三张张量(分割头d的输出)在0.8阈值情况下没有难例,这种情况下传输到ohem的forward中计算,就会出现pred为空的情况。

我的理解是,如果无难例情况,替换成将整个labels输入ohem loss中进行计算,改动如下:

try:
    bd_label = torch.where(torch.sigmoid(outputs[-1][:,0,:,:])>0.7, labels, filler)
    loss_sb = self.sem_loss([outputs[-2]], bd_label)
except:
    loss_sb = self.sem_loss([outputs[-2]], labels)

(此处阈值我使用的官方0.7)

QiaoShin avatar Nov 07 '22 05:11 QiaoShin

Did you try @tomihisaw's workaround? It works fine for me. Nearly 90% of my custom-dataset is background only and my training results are sufficient. The output during training didn't look very good to me (might be caused generally by inbalance) but the final prediction and visualization on a seperate test are ok.

Oh? I can run it, but it still doesn't work after a few epochs. I will not continue, then I will try again, thank you. image

Hello, buddy. I have encountered the same problem as you. Have you finally solved this problem? If so, can you share your modified code with me? My email address is: [email protected]

17648145240 avatar Mar 29 '23 03:03 17648145240

I have the same error. But all the solution above cannot solve my problem. I find the reason that internet have a lot of version of camvid(lable is colored or black-white),if you use colored version that can solve this problem and train normally.

make sure your camvid label like this:

image

instead of

image

Unusual6 avatar Aug 04 '23 07:08 Unusual6

I have the same error. But all the solution above cannot solve my problem. I find the reason that internet have a lot of version of camvid(lable is colored or black-white),if you use colored version that can solve this problem and train normally.

Did you modify the getitem function within the CamVid dataset class? The original CamVid dataset class uses the color2label method, which translates the colored map into grayscale label IDs. This may have potentially resulted with the images that are already in grayscale in weird class IDs that are outside the range. I have mentioned it in https://github.com/XuJiacong/PIDNet/issues/65.

pstemporowski avatar Aug 17 '23 10:08 pstemporowski

Hi, thank you for sharing your work. I tried to train the model on my custom dataset derived from CamVid and I'm getting error during calculating the loss function when target consists of only background class (255). The error appears on 73th line in utils/criterion.py.

pred = pred.gather(1, tmp_target.unsqueeze(1))
pred, ind = pred.contiguous().view(-1,)[mask].contiguous().sort()
min_value = pred[min(self.min_kept, pred.numel() - 1)] # --- here is the error because pred is an empty array and mask only consist of False values
threshold = max(min_value, self.thresh)

Jul

你好,请问这个错误找到解决方法了吗?我也遇到这个问题了,想请教一下

这个问题在难例挖掘ohem loss里几乎都会遇到。 此处你们出现此问题主要出现在这一行代码中:

https://github.com/XuJiacong/PIDNet/blob/f0ac91cdea7bf0cb2077b65e960c5b98b9173b0f/utils/utils.py#L53

code:bd_label = torch.where(torch.sigmoid(outputs[-1][:,0,:,:])>0.8, labels, filler) 注意,此处的0.8为作者设置的难例阈值,ohem官方给的阈值为0.7。当bd_label中元素值全为1,即表示outputs的第三张张量(分割头d的输出)在0.8阈值情况下没有难例,这种情况下传输到ohem的forward中计算,就会出现pred为空的情况。 我的理解是,如果无难例情况,替换成将整个labels输入ohem loss中进行计算,改动如下:

try:
    bd_label = torch.where(torch.sigmoid(outputs[-1][:,0,:,:])>0.7, labels, filler)
    loss_sb = self.sem_loss([outputs[-2]], bd_label)
except:
    loss_sb = self.sem_loss([outputs[-2]], labels)

(此处阈值我使用的官方0.7)

PID1 我修改后(将0.8改为0.7)还是报一样的错,请问是原因呢?

wakakkakak avatar Mar 10 '24 08:03 wakakkakak