Detectron.pytorch icon indicating copy to clipboard operation
Detectron.pytorch copied to clipboard

I meet problem during implement light_head_rcnn

Open hewumars opened this issue 6 years ago • 14 comments

loss_bbox is not converge.other loss(loss_cls,loss_rpn_cls,loss_bbox) is converge.can I push the code to you for debug.

hewumars avatar May 06 '18 08:05 hewumars

Hi, where did you take PSRoIPool layer from? A lot of PyTorch implementations of that layer are bugged. Also, they are probably implemented for single image batch and might not work with multiple images per batch.

Rizhiy avatar May 07 '18 12:05 Rizhiy

PSRoI_Align from https://github.com/zengarden/light_head_rcnn PSRoIPooling from https://github.com/PureDiors/pytorch_RFCN
I set batchsize=1 when trianing light_head_rcnn. the codes seem to be able to work with multiple images per batch,but at least single image per batch can work.

hewumars avatar May 07 '18 13:05 hewumars

I'm pretty sure PSRoIPooling in that repo is bugged, see: https://github.com/PureDiors/pytorch_RFCN/issues/4.

Rizhiy avatar May 07 '18 14:05 Rizhiy

light head rcnn model also is not converge use PSRoI_Align from https://github.com/zengarden/light_head_rcnn ,I pull requests:https://github.com/roytseng-tw/Detectron.pytorch/pull/48

hewumars avatar May 07 '18 15:05 hewumars

I will carefully check the code

hewumars avatar May 07 '18 15:05 hewumars

@Rizhiy could you share PSRoIPooling ? I compare the code with https://github.com/msracver/Deformable-ConvNets/blob/master/rfcn/operator_cxx/psroi_pooling.cu,the different as shown: image

hewumars avatar May 08 '18 01:05 hewumars

@hewumars I haven't yet got PSRoIPooling to work in PyTorch either.

Rizhiy avatar May 08 '18 20:05 Rizhiy

@Rizhiy How is the PSROI pooling going? I have seen you in many different repos. I think we both focus on the light-head rcnn, right? I don't get the PSRoIpooling in Pytorch either. I think it could be easier to use the code from the official tf implementation.

YanShuo1992 avatar Jul 17 '18 07:07 YanShuo1992

@YanShuo1992 I'm currently using roytseng-tw/Detectron.pytorch, so far I have focused on getting the best mAP, so didn't put much work in light-head. I will try to let you know if I get something working.

Rizhiy avatar Jul 17 '18 12:07 Rizhiy

@hewumars @Rizhiy I checked @hewumars 's light head rcnn code. I might find something wrong. I use the PSROIpooling after the res5 or stage5 in resnet50, right? But the RPN is still after the stage4. What do you think?

YanShuo1992 avatar Jul 19 '18 06:07 YanShuo1992

That's not entirely correct. You need to pass output of res5, through a layer which has k*k*n filters, where k is pooling size and n is arbitrary number of layers (10 in the paper). Then you apply psroipool on that.

I suggest you check https://github.com/msracver/Deformable-ConvNets/blob/f4e163719c8e63cfad7af1caaaab93d373750393/rfcn/symbols/resnet_v1_101_rfcn.py#L785-L798 for reference.

Rizhiy avatar Jul 19 '18 18:07 Rizhiy

@Rizhiy I will check the official rfcn to see how the rpn and large conv orignized. @roytseng-tw I am trying to implement the light rcnn based on your code. I tried a code from @hewumars and I get RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generic/THCStorage.cu:58

So that I check the .cu code of psroipooling. I find you commit that do not use rounding in the roialign_kernel.cu. Can you tell me the reason for that or what problem it will lead?

YanShuo1992 avatar Jul 20 '18 00:07 YanShuo1992

@YanShuo1992 are you meet out of memory after some iterations? i meet same question , i compare psroi code with caffe2 and can't find some things.but i barely use CUDA coding so...... do you solve the problem?

GYxiaOH avatar Aug 22 '18 09:08 GYxiaOH

@GYxiaOH Yes. I meet the out of memory when using psroi. I also check the caffe2 code or the tensorflow code and I find nothing. For now, I just give up the psroi and use alignroi.

YanShuo1992 avatar Aug 23 '18 00:08 YanShuo1992