Matrix-Capsules-EM-Tensorflow
Matrix-Capsules-EM-Tensorflow copied to clipboard
EM routing for convolutional capsules
Hi,
thanks for sharing your implementation. I read through your high-level description on https://openreview.net/forum?id=HJWLfGWRb , and I have a question about your implementation of the EM routing for convolutional capsule layers. I haven't looked deeply into your code yet, so I apologize if I'm wrong.
In particular, I think that you clone each patch / perceptive field into a column matrix and then handle each perceptive field separately as in the fully connected case. But this disregards the fact that each input capsule appears in multiple perceptive fields - and these influence each other. So if capsule I1 appears in the perceptive fields of capsules O1 and O2, and the EM determines that O1 is a good match for I1, than it cannot be a good match for O2.
This cross-influencing of capsules is briefly mentioned in the paper:
For convolutional capsules, each capsule in layer L+1 sends feedback only to capsules within its receptive field in layer L. Therefore each convolutional instance of a capsule in layer L receives at most kernel_size x kernel_size feedback from each capsule type in layer L+1.
Am I correct that this means that one cannot treat each input patch in isolation but has to run a global EM pass, respecting the more complex connectivity?
If you mean that there are overlaps between perceptive field, we have tackle this in the 'kernel_tile' function. Except that, I think conv capsule has no contradiction with cross-influence. Basically, the input and output interface of EM routing are consistent with convolutional layers, except scalars are replaced by matrices. The BP algorithm should work well just like for multiple convolutional layers.
If you mean that there are overlaps between perceptive field, we have tackle this in the 'kernel_tile' function.
By copying the input poses & activations, similar to tf.extract_image_patches, right? So one input capsule appears in multiple "batches", which are treated in isolation? I think you cannot do that (at least without synchronizing the EM results across batches). I might be wrong, though ;-)
Maybe I can make more sense by saying it in another way: The EM routing algorithm "explains" pose votes using Gaussians (like GMM). If one input pose vote is already explained by one Gaussian (so r_ij=1), then it cannot be explained by another gaussian/output capsule again (so r_ik=0 for all k!=j). If you rip the patches apart and treat them isolated, then this constraint is not enforced. This means that small object parts can be part of multiple larger objects - which goes against the intuition of capsules as far as I understand.
So I think the M-step should run for each output capsule and iterate over all connected input capsules (you are doing this using kernel_tile), and the E-step should run once for each input capsule and iterate over all connected output capsules. I can't see where you are doing this - probably you do it K*K times for each input capsule (once for each output capsule) without aggregating the results.
In my opinion the paper is pretty weak in explaining EM routing for convolutional capsule layers - there are literally only three sentences about it. I'm just trying to find out what they mean exactly...