capsule_net_pytorch
capsule_net_pytorch copied to clipboard
Squash function
Very nice tutorial, though I want to point out that the squash formula in the notebook differs from the paper. Instead of
it should be
, so the first fraction is a factor a slightly below 1, and the second one nozmalizes vector coordinates by the magnitude.
As far as I can see, the implementation follows the second formula and it seems to be correct, except that I am not sure about the normalization dimension for primary capsules. According to the explanations from the notebook, each primary capsule outputs a vector of size 32 * 6 * 6. Then these vectors are stacked and, considering the batch dimension, we get a tensor of the shape
(batch_size, num_nodes_in_capsule = 32 * 6 * 6, num_capsules = 8)
Finally, these vectors are normalized, i.e. their magnitudes are squashed to be in the range from 0 to 1. If I understand correctly, you are talking about the magnitude of the (32 * 6 * 6)-dimensional vectors. So if we want to ensure that the length of these vectors is in range [0; 1], we would have to divide each of the (36 * 6 * 6) coordinates by the square root of the sum of squares of these coordinates. Right? In fact, the implementation divides each coordinate by the magnitude of a vector comprised of the coordinates in the same positions of all capsule vectors. See dim is set to -1 when calculating squared_norm, i.e. it sums up same features, but from different capsules.
Please, consider the following example:
import torch
import numpy as np
def squash(input_tensor):
'''Squashes an input Tensor so it has a magnitude between 0-1.
param input_tensor: a stack of capsule inputs, s_j
return: a stack of normalized, capsule output vectors, v_j
'''
squared_norm = (input_tensor ** 2).sum(dim=-1, keepdim=True)
scale = squared_norm / (1 + squared_norm) # normalization coeff
output_tensor = scale * input_tensor / torch.sqrt(squared_norm)
return output_tensor
np.random.seed(1)
torch.manual_seed(1)
batch_size = 15
dim=13
n_caps = 7
u = [torch.tensor(np.random.rand(batch_size, dim, 1)) for i in range(n_caps) ]
#print(u)
u = torch.cat(u, dim=-1)
print("u:", u)
u_squash = squash(u)
print("u_squash:", u_squash)
mag = torch.sqrt( (u_squash **2).sum(dim=-2) )
print("mag: ", mag)
Here I create a randomly filled tensor of shape (batch_size, dim, n_caps), i.e. similar to those produced by the primary capsules. The tensor is squashed by the same function used in the notebook. It can be seen from the output that the magnitudes of the vectors exceeds the range [0; 1]:
mag: tensor([[0.6629, 1.0954, 0.9715, 0.7817, 1.0211, 0.7117, 0.8847],
[1.0202, 0.9313, 0.8816, 0.8383, 1.0355, 0.9926, 1.0803],
[0.8864, 1.0694, 0.7617, 0.9194, 0.8355, 0.9432, 1.0051],
[0.9630, 0.9198, 0.9078, 1.0516, 0.8845, 0.7888, 0.9238],
[0.6996, 1.0998, 1.1319, 0.6556, 0.8243, 0.9571, 0.9614],
[0.9705, 0.9879, 0.8915, 0.8308, 1.0063, 1.0607, 0.9306],
[1.0569, 1.0294, 0.9268, 1.0508, 0.9768, 0.9505, 0.8103],
[0.9545, 0.9655, 0.9052, 1.0720, 0.7246, 0.9666, 0.9669],
[1.1237, 0.9768, 0.9749, 0.8128, 0.8935, 0.9216, 0.7607],
[0.8785, 0.7155, 0.8306, 0.8913, 0.9764, 0.9692, 1.0892],
[0.9691, 0.8658, 1.0399, 0.9774, 0.9309, 0.8950, 0.8872],
[0.7124, 1.1386, 0.8535, 1.0913, 0.8478, 0.8779, 0.9850],
[0.8909, 0.9851, 0.9247, 1.0239, 0.7927, 0.9618, 0.7925],
[0.8764, 0.9524, 0.9294, 0.8517, 0.8385, 0.9380, 1.0824],
[1.0076, 0.8668, 1.0051, 0.9030, 1.0067, 0.8850, 0.9519]],
dtype=torch.float64)
It actually enforces the magnitudes of vectors comprised of particular coordinates from different capsule outputs to be in that range. But was that intended?