singa
singa copied to clipboard
AlexNet bacward shape missmatch + ReLu return a tuple
Hi,
I have implemented AlexNet in singa but I obtain an error during the backward_and_update instruction. I am using Singa 3.0.0.rc1 on cpu.
This is my AlexNet implementation: `from singa import autograd from singa import module from singa import opt
all = ['AlexNet', 'alexnet']
class AlexNet(module.Module): def init(self, num_classes=1000): super(AlexNet, self).init() # 12 sur GPU donc 6 & 6 self.features1 = [ autograd.Conv2d(3,64,kernel_size=11,stride=4,padding=2), autograd.ReLU(), autograd.MaxPool2d(kernel_size=3, stride=2), autograd.Conv2d(64,192,kernel_size=5,padding=2), autograd.ReLU(), autograd.MaxPool2d(kernel_size=3, stride=2), autograd.Conv2d(192,384,kernel_size=3,padding=1), autograd.ReLU(), autograd.Conv2d(384, 256,kernel_size=3,padding=1), autograd.ReLU() ] self.features2 = [ autograd.Conv2d(256, 256,kernel_size=3,padding=1), autograd.ReLU(), autograd.MaxPool2d(kernel_size=3, stride=2) ] self.avgpool = autograd.AvgPool2d(6, stride=1) self.flatten = autograd.Flatten() self.classifier = [ autograd.Dropout(), autograd.Linear(256 * 6 * 6, 4096), autograd.ReLU(), autograd.Dropout(), autograd.Linear(4096, 4096), autograd.ReLU(), autograd.Linear(4096, num_classes) ] self.optimizer = opt.SGD(lr=0.001, momentum=0.9) def loss(self, out, ty): return autograd.softmax_cross_entropy(out, ty) def optim(self, loss, dist_option, spars): if dist_option == 'fp32': self.optimizer.backward_and_update(loss) elif dist_option == 'fp16': self.optimizer.backward_and_update_half(loss) elif dist_option == 'partialUpdate': self.optimizer.backward_and_partial_update(loss) elif dist_option == 'sparseTopK': self.optimizer.backward_and_sparse_update(loss, topK=True, spars=spars) elif dist_option == 'sparseThreshold': self.optimizer.backward_and_sparse_update(loss, topK=False, spars=spars) def forward(self, x): for (i,layers) in enumerate([self.features1, self.features2, [ self.avgpool,self.flatten ] , self.classifier]): for (j,fn) in enumerate(layers): x = fn(x) if(type(x) is tuple):# FIXME I have to do that because of a bug in Singa? (ReLU) x = x[0] return x
def alexnet(**kwargs): return AlexNet(**kwargs) ` And I get : AssertionError: ('shape mismatch', (9216, 4096), (256, 4096)) Which is my first linear layer : 256 * 6 * 6, 4096
When I use my VGG16 implementation, I got a similar error : AssertionError: ('shape mismatch', (25088, 4096), (512, 4096))
It seems that the backward operation does not map the correct shape to the corresponding layer.
Moreover, the ReLu class return a 1-tuple containing a Tensor. Is it intended or is it a bug?
Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer. usage: https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40
For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.
Ok, I'll try but why to provide a statefull ReLU Layer? Is it for a specific purpose?
I compared my implementation to other frameworks and it is the same shapes. Moreover the forward pass does not cause any issue, it is the backward pass. This is why I suspect a bug. Is it possible?
Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer. usage: https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40
For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.
@dcslin Did you try to run the code pasted by @Belegkarnil ? Can you reproduce the error?
Hi, as pointed out by @chrishkchris , the convention is to use RELU as stateless layer. usage: https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40 For shape mismatch, you might need to check the shape of layers again. Let me know if further info is required.
@dcslin Did you try to run the code pasted by @Belegkarnil ? Can you reproduce the error?
I am still checking the code
Hi @Belegkarnil, you might need to change 256 * 6 * 6, 4096 to 256, 4096 to make it works.
Also you are recommended to use relu/dropout/flatten like this https://github.com/apache/singa/blob/master/examples/cnn/model/cnn.py#L40
Ok thanks a lot ! I assumed that it works like other frameworks but that the result of AvgPool has a different shape.