StereoSpike encoder , decoder framework for image reconstruction using event cameras

i tried to use stereospike which is using your repo to do image reconstruction from event camera rather than optical flow. But iam unable to get reconstruct image using event camera as input.

Requesting you to help me in this regard.

Below is the encoder and decoder written with spikingjelly repo: Iam getting bad image reconstruction with no texture

fangwei123456 directed at you to help in this regard

class StereoSpike(NeuromorphicNet): """ Baseline model, with which we report state-of-the-art performances in the second version of our paper.

- all neuron potentials must be reset at each timestep
- predict_depth layers do have biases, but it is equivalent to remove them and reset output I-neurons to the sum
       of all 4 biases, instead of 0.
"""
def __init__(self, surrogate_function=surrogate.ATan(), detach_reset=True, v_threshold=1.0, v_reset=0.0, multiply_factor=1.):
    super().__init__(surrogate_function=surrogate_function, detach_reset=detach_reset)

    # bottom layer, preprocessing the input spike frame without downsampling
    self.bottom = nn.Sequential(
        nn.Conv2d(in_channels=5, out_channels=32, kernel_size=5, stride=1, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )

    # encoder layers (downsampling)
    self.conv1 = nn.Sequential(
        nn.Conv2d(in_channels=32, out_channels=64, kernel_size=5, stride=2, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.conv2 = nn.Sequential(
        nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=2, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.conv3 = nn.Sequential(
        nn.Conv2d(in_channels=128, out_channels=256, kernel_size=5, stride=2, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.conv4 = nn.Sequential(
        nn.Conv2d(in_channels=256, out_channels=512, kernel_size=5, stride=2, padding=2, bias=False),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )

    # residual layers
    self.bottleneck = nn.Sequential(
        SEWResBlock(512, v_threshold=self.v_th, v_reset=self.v_rst, connect_function='ADD', multiply_factor=multiply_factor),
        SEWResBlock(512, v_threshold=self.v_th, v_reset=self.v_rst, connect_function='ADD', multiply_factor=multiply_factor),
    )

    # decoder layers (upsampling)
    self.deconv4 = nn.Sequential(
        NNConvUpsampling2(in_channels=512, out_channels=256, kernel_size=3, scale_factor=2),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.deconv3 = nn.Sequential(
        NNConvUpsampling2(in_channels=256, out_channels=128, kernel_size=3, scale_factor=2),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.deconv2 = nn.Sequential(
        NNConvUpsampling2(in_channels=128, out_channels=64, kernel_size=3, scale_factor=2),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )
    self.deconv1 = nn.Sequential(
        NNConvUpsampling2(in_channels=64, out_channels=32, kernel_size=3, scale_factor=2),
        MultiplyBy(multiply_factor),
        neuron.IFNode(v_threshold=self.v_th, v_reset=self.v_rst, surrogate_function=self.surrogate_fct, detach_reset=True),
    )

    # these layers output depth maps at different scales, where depth is represented by the potential of IF neurons
    # that do not fire ("I-neurons"), i.e., with an infinite threshold.
    self.predict_depth1 = nn.Sequential(
        NNConvUpsampling2(in_channels=32, out_channels=1, kernel_size=3, scale_factor=1, bias=True),
        MultiplyBy(multiply_factor),
    )

    self.Ineurons = neuron.IFNode(v_threshold=float('inf'), v_reset=0.0, surrogate_function=self.surrogate_fct)
    self.sigmoid = nn.Sigmoid()
    self.num_encoders = 4

def forward(self, x,pred):

    # x must be of shape [batch_size, num_frames_per_depth_map, 4 (2 cameras - 2 polarities), W, H]
    frame = x

    # data is fed in through the bottom layer
    out_bottom = self.bottom(frame)

    # pass through encoder layers
    out_conv1 = self.conv1(out_bottom)
    out_conv2 = self.conv2(out_conv1)
    out_conv3 = self.conv3(out_conv2)
    out_conv4 = self.conv4(out_conv3)

    # pass through residual blocks
    out_rconv = self.bottleneck(out_conv4)

    # gradually upsample while concatenating and passing through skip connections
    out_deconv4 = self.deconv4(out_rconv)
    out_add4 = out_deconv4 + out_conv3
    # self.Ineurons(self.predict_depth4(out_add4))

    out_deconv3 = self.deconv3(out_add4)
    out_add3 = out_deconv3 + out_conv2
    # self.Ineurons(self.predict_depth3(out_add3))

    out_deconv2 = self.deconv2(out_add3)
    out_add2 = out_deconv2 + out_conv1
    # self.Ineurons(self.predict_depth2(out_add2))

    out_deconv1 = self.deconv1(out_add2)
    out_add1 = out_deconv1 + out_bottom
    self.Ineurons(self.predict_depth1(out_add1))
    img = self.sigmoid(self.Ineurons.v)

    return {'image': img}

def set_init_depths_

potentials(self, depth_prior): self.Ineurons.v = depth_prior

Results

Feb 21 '22 15:02 ChidanandKumarKS

Hello @ChidanandKumarKS and thank you for your interest in our work !

Just like you, I would also think intuitively that StereoSpike architecture would fit for image reconstruction. However, there are many places where you might have done a "mistake" and I don't have enough information. Can you tell me more about your approach ? For instance, have you double-checked your loss, and if yes, is it suitable for grayscale image reconstruction ? Are you using the same dataloading than on this repo ? Do you use a specific format for the input data ? (I see that your 'bottom' layer takes 5 channels for its input) Does your loss decreases and on how many epochs has your model been trained ?

Concerning the architecture, I don't see anything wrong in particular with what you've shared. The only thing that I would point out is that you're not using StereoSpike's intermediary predictions, but it's just a note in case it is not intentional from you.

Sorry to have answered you a bit late, but I hope these answers can help you ! Don't hesitate to ask if you have more questions !

Feb 25 '22 09:02 urancon

https://github.com/fangwei123456/spikingjelly/issues/177#issuecomment-1073547664

Mar 21 '22 07:03 fangwei123456

StereoSpike StereoSpike copied to clipboard

encoder , decoder framework for image reconstruction using event cameras

StereoSpike
StereoSpike copied to clipboard