light_head_rcnn icon indicating copy to clipboard operation
light_head_rcnn copied to clipboard

where is xception like network code which is written in original paper?

Open machanic opened this issue 7 years ago • 17 comments

where is xception like network code which is written in original paper?

machanic avatar Apr 06 '18 13:04 machanic

@zengarden, Would you mind to provide the code and model of xception like network? I've try to implemented and train it with Imagenet. The best accuracy I get is around 64%. would you mind share the relative code and experience of that?

Best,

foreverYoungGitHub avatar Apr 11 '18 14:04 foreverYoungGitHub

@foreverYoungGitHub I'm reproducing the efficient xception like network in TF. Since the xception network mentioned in paper is built and trained in our internal platform, i need some time to transfer it to TF and ensure the accuracy.

zengarden avatar Apr 11 '18 15:04 zengarden

Hi, I have also tried the original Xception backbone in my re-implementation and found it worked too. And the forward time in training for backbone network is about 20ms. But it's maybe better to switch to xception-like for better acc and speed.

HiKapok avatar Apr 18 '18 08:04 HiKapok

@HiKapok Could you please tell me the speed and map in test?

MaskVulcan avatar Apr 24 '18 08:04 MaskVulcan

@MaskVulcan I didn't measure the speed in test time, in fact the nms op in TensorFlow is a little slow which is a bottleneck of the full detection pipeline. You can find the codes in my github if you are interested in. But I still recommend you wait for the official one to ensure the speed.

HiKapok avatar Apr 24 '18 08:04 HiKapok

@HiKapok OK,thanks! Would you mind to tell me your map in Xception?

MaskVulcan avatar Apr 24 '18 08:04 MaskVulcan

@MaskVulcan 74.5mAP now on pascal 2007 test.

HiKapok avatar Apr 24 '18 08:04 HiKapok

Any update about the xception-like network?

jiang1st avatar Apr 25 '18 14:04 jiang1st

Hi, can someone provide only some insights about the Xception-like model? I think I can implement the training and some other stuff, but I'm not sure if I understood well this model. I have some questions:

  • is the last layer a FC 1000? because I don't know what is the last layer to connect the light-head R-CNN

  • what does this expression (from original paper) mean? ""Following xception design strategies, we replace all convolution layer in bottle-neck structure with channel-wise convolution. However we do not use the pre-activation design which is proposed in identity mappings [10] because of shallow network.""

  • Should I replace the 7 convolutional layers(first layer and stage 1,2 and 3) for convolution layers in bottleneck structure with channel-wise convolution?

  • is this the large separated convolution? from 785-792 lines https://github.com/terrychenism/Deformable-ConvNets/blob/master/rfcn/symbols/resnet_v1_101_rfcn_light.py#L785

@HiKapok good job. 74.5mAP is Pascal VOC is inferior to Yolov2 considering Light-Head R-CNN should have a better mAP.

Thank you in advanced guys.

emedinac avatar May 14 '18 21:05 emedinac

@zengarden How is the tf implementation going? Have you finished it? Or could you please just release the structure of the Xception? I think lots of people are waiting for it. Cheers

YanShuo1992 avatar Jul 07 '18 11:07 YanShuo1992

@foreverYoungGitHub Hello. Would you mind to tell me your mAP and FPS in MSCOCO using xception like backbone in Light-Head R-CNN? I also tried, but I did not reach 30.7 mAP at 700*1100 size.

geonseoks avatar Jul 16 '18 07:07 geonseoks

how many layers does xception-like backbone in Light-Head R-CNN has? Are they 17 conv layers (based on Table 7) and 2 MLP (from original code)? edited: @geonseoks I computed the mAP using the test.py (cocoapi metric) and I got this:

evaluation epoch 20
loading annotations into memory...
Done (t=1.20s)
creating index...
index created!
Loading and preparing results...
DONE (t=6.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=75.63s).
Accumulating evaluation results...
DONE (t=14.30s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.040
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.085
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.034
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.019
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.075
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.067
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.086
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.088
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.036
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.172

I'm obtaining approximately 20-21 FPS using 2 images in testing. and rarely it decreases to 10-12 FPS during testing.

emedinac avatar Jul 16 '18 13:07 emedinac

@edgarmedina1801 Yes, I use ConV1~FC on Tabel 7.

And here is my answer.

is the last layer a FC 1000? because I don't know what is the last layer to connect the light-head R-CNN -> I connect Stage 3 Layer to RPN's input and Stage 4 Layer to global context module's input(make stride to 1 at Stage4)

what does this expression (from original paper) mean? ""Following xception design strategies, we replace all convolution layer in bottle-neck structure with channel-wise convolution. However we do not use the pre-activation design which is proposed in identity mappings [10] because of shallow network."" -> I change all 3*3 conv in Stage2~4 to channel-wise conv. And pre-activation design is described in the paper[10]. So you can understand the backbone is Xeption like ResNet. Because ResNet have bottle-neck structure. I think this backbone is similar to Mobilenet V2.

geonseoks avatar Jul 25 '18 13:07 geonseoks

@geonseoks Thanks a lot for elaborating on Xception network. I have another question if you don't mind. When the author says channel wise convolution, is it tf.nn.depthwise_conv2d or tf.nn.separable_conv2d in tensorflow?

karansomaiah avatar Jan 17 '19 15:01 karansomaiah

@karansomaiah Good question, I use separable_conv2d. I will try to use depthwise_conv2d later.

geonseoks avatar Jan 19 '19 05:01 geonseoks

Hey @geonseoks, I tried implementing with the information you provided but I get a lot of nan values in the loss very early in training. I even checked your code:

  1. I still face the same issue
  2. In line 157 where proposal_opr is applied, I cannot use is_tfnms=False since it gives me an error saying lib_kernel.lib_fast_nms not found. I checked further, and it seems @zengarden has removed it in the latest commit. Any help will be appriciated

Thanks in advance.

karansomaiah avatar Jan 23 '19 14:01 karansomaiah

Do you guys change the spatial_scale argument for PSAlign code in your network description for xception network definition? @geonseoks @edgarmedina1801 @HiKapok

karansomaiah avatar Mar 11 '19 20:03 karansomaiah