Pyramid-Attention-Networks-pytorch icon indicating copy to clipboard operation
Pyramid-Attention-Networks-pytorch copied to clipboard

Did the classification module help?

Open chenyzh28 opened this issue 6 years ago • 18 comments

Thanks for your work! I removed the classification module and its related loss and the performance is about 77%. I wonder if you have done the experiments with classification loss (it seems to serve as a guide to segmentation, if not detach with previous part).

chenyzh28 avatar Feb 23 '19 15:02 chenyzh28

how do you work it? i work it like you and i get some wrongs: /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [452,0,0] Assertion t >= 0 && t < n_classes failed. could you give some advice? Thank you!

chezhizhong avatar Mar 05 '19 12:03 chezhizhong

This problem may be caused by the maximum value in label exceeding n_classes. Have you pre-processed the VOC data following https://www.sun11.me/blog/2018/how-to-use-10582-trainaug-images-on-DeeplabV3-code/

chenyzh28 avatar Mar 05 '19 12:03 chenyzh28

Thanks for your reply! I work my code again, i get a new wrong: pixel_acc += mask_pred.max(dim=1)[1].data.cpu().eq(mask_labels.squeeze(1).cpu()).float().mean(), i check the two's shape, they are identical.

chezhizhong avatar Mar 05 '19 12:03 chezhizhong

What is the error message?

chenyzh28 avatar Mar 05 '19 12:03 chenyzh28

Traceback (most recent call last): File "train.py", line 205, in train(epoch, optimizer, training_loader) File "train.py", line 131, in train pixel_acc += mask_pred.max(dim=1)[1].data.cpu().eq(mask_labels.squeeze(1).cpu()).float().mean() RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/generic/THCTensorCopy.cpp:70

chezhizhong avatar Mar 05 '19 12:03 chezhizhong

This is because your GPU has not enough memory for the training process. You can decrease the batch size and try again.

chenyzh28 avatar Mar 05 '19 13:03 chenyzh28

Sorry to bother you again. i get the error again: /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [0,0,0], thread: [228,0,0] Assertion t >= 0 && t < n_classes failed. how can i resolve this error? how the maximum set?

chezhizhong avatar Mar 05 '19 13:03 chezhizhong

You can follow this link https://www.sun11.me/blog/2018/how-to-use-10582-trainaug-images-on-DeeplabV3-code/ and re-construct your dataset.

chenyzh28 avatar Mar 05 '19 13:03 chenyzh28

I'm sorry to bother u,I want to ask in dataset.py, "labels = np.load('/home/liekkas/DISK2/jian/PASCAL/VOC2012/cls_labels.npy')[()]" I can't find cls_label.npy,what should I do to slove this problem.(明明都是中国人,但是我还是要用渣英文来求助,心好累哦,哭)

zhenmafan7 avatar Apr 08 '19 12:04 zhenmafan7

这个是分类的标签,作者没有提供,你可以把分类那个分支给去掉,然后这句话可以注释掉,不输出分类的标签。

chenyzh28 avatar Apr 08 '19 12:04 chenyzh28

这个是分类的标签,作者没有提供,你可以把分类那个分支给去掉,然后这句话可以注释掉,不输出分类的标签。

我知道有点蠢,但是我还是想问,分类的分支是哪个部分啊?TAT

zhenmafan7 avatar Apr 08 '19 12:04 zhenmafan7

classifier模块(不是mask_classifier)就是分类的分支,你把相关部分都删掉就可以。

chenyzh28 avatar Apr 08 '19 13:04 chenyzh28

classifier模块(不是mask_classifier)就是分类的分支,你把相关部分都删掉就可以。

谢谢你,我理论上知道了,实际上操作...我慢慢、慢慢、慢慢、慢慢琢磨吧(真难TAT) thank you ありがとうございます 감사합니다

zhenmafan7 avatar Apr 08 '19 13:04 zhenmafan7

不好意思再次打扰你,我按照你所说把分支模块删除后进行训练,随后运行eval.py时加了print打印了测试结果(如果不加我看不到任何结果,不知道是不是代码哪里改错了),打印的结果如下: Length of test set:1449 Each_cls_IOU:{'background': 0.0, 'aeroplane': 0.0, 'bicycle': 0.0, 'bird': 0.0, 'boat': 0.0, 'bottle': 0.0, 'bus': 0.0, 'car': 0.0, 'cat': 0.0, 'chair': 0.0, 'cow': 0.0, 'diningtable': 0.0, 'dog': 0.0, 'horse': 0.0, 'motorbike': 0.0, 'person': 18.41086525189786, 'pottedplant': 0.0, 'sheep': 0.0, 'sofa': 0.0, 'train': 0.0, 'tvmonitor': 0.0} mIOU:0.8767 PA:5.03% loss_ic0.000000 这很奇怪,我可以对照一下你修改后的文件吗?

zhenmafan7 avatar Apr 09 '19 02:04 zhenmafan7

不好意思再次打扰你,我按照你所说把分支模块删除后进行训练,随后运行eval.py时加了print打印了测试结果(如果不加我看不到任何结果,不知道是不是代码哪里改错了),打印的结果如下: Length of test set:1449 Each_cls_IOU:{'background': 0.0, 'aeroplane': 0.0, 'bicycle': 0.0, 'bird': 0.0, 'boat': 0.0, 'bottle': 0.0, 'bus': 0.0, 'car': 0.0, 'cat': 0.0, 'chair': 0.0, 'cow': 0.0, 'diningtable': 0.0, 'dog': 0.0, 'horse': 0.0, 'motorbike': 0.0, 'person': 18.41086525189786, 'pottedplant': 0.0, 'sheep': 0.0, 'sofa': 0.0, 'train': 0.0, 'tvmonitor': 0.0} mIOU:0.8767 PA:5.03% loss_ic0.000000 这很奇怪,我可以对照一下你修改后的文件吗?

你好,请问你这个开源代码跑出来了吗,我最近也开始尝试这个代码,遇到了和你一样的问题,在你的个人主页上没有找到邮件,请问可以联系你吗?

LeeThrzz avatar May 08 '19 02:05 LeeThrzz

不好意思再次打扰你,我按照你所说把分支模块删除后进行训练,随后运行eval.py时加了print打印了测试结果(如果不加我看不到任何结果,不知道是不是代码哪里改错了),打印的结果如下: Length of test set:1449 Each_cls_IOU:{'background': 0.0, 'aeroplane': 0.0, 'bicycle': 0.0, 'bird': 0.0, 'boat': 0.0, 'bottle': 0.0, 'bus': 0.0, 'car': 0.0, 'cat': 0.0, 'chair': 0.0, 'cow': 0.0, 'diningtable': 0.0, 'dog': 0.0, 'horse': 0.0, 'motorbike': 0.0, 'person': 18.41086525189786, 'pottedplant': 0.0, 'sheep': 0.0, 'sofa': 0.0, 'train': 0.0, 'tvmonitor': 0.0} mIOU:0.8767 PA:5.03% loss_ic0.000000 这很奇怪,我可以对照一下你修改后的文件吗?

你好,请问你这个开源代码跑出来了吗,我最近也开始尝试这个代码,遇到了和你一样的问题,在你的个人主页上没有找到邮件,请问可以联系你吗?

我训练完之后就是上述的结果,后续没有再尝试过了,主页开放了邮箱可以联系,关于这套代码的话我放弃了,不知道是否还能帮上你。

zhenmafan7 avatar May 08 '19 03:05 zhenmafan7

if you use torch.nn.CrossEntropyLoss, set the ignore_index may help.

The Pascal VOC ignore classifier 255, which was its white border, if you have boader, just ignore the board ID.

FantasyJXF avatar Aug 21 '19 03:08 FantasyJXF

那个分类分支的loss能提高性能吗,有什么用?

LiouCZ avatar May 15 '20 13:05 LiouCZ