SCNN icon indicating copy to clipboard operation
SCNN copied to clipboard

Some questions from the paper

Open xiaoshuliu opened this issue 6 years ago • 7 comments

Hi Xingang, could I ask some questions about the SCNN?

  1. What is spatial cross entropy loss in training mentioned in paper?
  2. Are labels pixel level segmentation of the same size as input image?
  3. Currently I am doing lane segmentation using a network with some conv layers and then some deconv layers. Should I insert SCNN right after conv layers and before deconv layers?

Thanks!

xiaoshuliu avatar Jul 09 '18 01:07 xiaoshuliu

  1. Check this https://pytorch.org/docs/stable/nn.html#crossentropyloss
  2. Yes.
  3. I haven't try this, and its hard to say. I would recommend inserting SCNN right after conv layers, since this would save computations.

XingangPan avatar Jul 13 '18 05:07 XingangPan

@XingangPan Thank you. I think the link you provided in question 1 is explanation for a normal cross entropy loss, but what is the difference between "spatial cross entropy loss" and "cross entropy loss"? Those two terms are in Fig. 5 in your paper.

xiaoshuliu avatar Jul 13 '18 17:07 xiaoshuliu

@XingangPan One more question please... Have you tested the inference time of SCNN with VGG or Resnet? Is time you reported in Table 6 "SCNN_DULR: 42ms" is only for the SCNN part, or the full segmentation task?

xiaoshuliu avatar Jul 13 '18 17:07 xiaoshuliu

@xiaoshuliu spatial cross entropy loss is to apply cross entropy loss for each spatial location. The time reported is only for the SCNN part.

XingangPan avatar Jul 30 '18 13:07 XingangPan

Hi, I would like to ask you some questions about model structure and implementation code. I have read the implementation of LargeFOV here: https://github.com/DrSleep/tensorflow-deeplab-lfov/blob/master/deeplab_lfov/model.py

  • In their implementation, they added 2 more CONV layers (and dropout also) after the original VGG model. However, I could not find these extra layers in your implementation, but they are available in the tensorflow implementation of this author: https://github.com/cardwing/Codes-for-Lane-Detection/blob/master/SCNN-Tensorflow/lane-detection-model/encoder_decoder_model/vgg_encoder.py So could you explain to me about the difference between these implementations? If I misunderstand your paper at some points, please help me to get it right. Thank you for considering my question here.

ivo-gilles avatar Dec 10 '18 13:12 ivo-gilles

@aquariusnk My implementation and https://github.com/cardwing/Codes-for-Lane-Detection/blob/master/SCNN-Tensorflow/lane-detection-model/encoder_decoder_model/vgg_encoder.py are the same. If you print the provided vgg.t7 model, you would get: vggmodel In (43)(4) we use 128 channels in order to save memory. (44) is another branch to classify the existence of lane markings, as discussed in the paper. From (1) to (43), there are 16 layers as in the original VGG16.

XingangPan avatar Dec 10 '18 13:12 XingangPan

@XingangPan thank you for your clear explanation. I will read your code again to fully get your ideas. Due to I am not familiar with torch/lua (I mostly work with tensorflow), I could not find out the part where you put the layer (43) at the first read. Then I will try to read it again. Thank you again for a brilliant work.

ivo-gilles avatar Dec 11 '18 03:12 ivo-gilles