botnet-detection icon indicating copy to clipboard operation
botnet-detection copied to clipboard

Correspondences of normalization between code and paper

Open grigvardanyan opened this issue 3 years ago • 4 comments

In the paper described random walk normalization instead of symmetric normalization.

image

In the source code, I think it is implemented in an incorrect way.

image

Here we normalize each node based on the inverse of its degree, then we do aggregation. I think we have to aggregate all features, then in the end apply an inverse degree normalization.

Assume x1, x2, x3 are nodes. If we want update representation of x3 node and it is connected with x1 and x2, we calculate in the following way relu(x1/degree_x1 + x2/degree_x2), but actually we have to do relu(x1/degree_x1 + x2/degree_x1). So I think we have to normalize based on source node degree.

Please correct me if I understood something wrong. Thanks in advance.

grigvardanyan avatar Feb 22 '22 13:02 grigvardanyan

Hi thanks for the question! Currently it is normalized by the source node degree when self.deg_norm == 'rw'. Can you explain a bit about "but actually we have to do relu(x1/degree_x1 + x2/degree_x1)" where x2 also gets normalized by degree_x1?

jzhou316 avatar Feb 23 '22 14:02 jzhou316

Here is sketch what I meant, sorry for not good quality :) image

grigpuma avatar Feb 23 '22 16:02 grigpuma

Hi I think based on your description, we first normalize the feature for each node based on it's outgoing degree, and then aggregate the features of different source nodes into a destination node. That's also what we did in the code. So for example in your above graph, we have sources of x1 being x0, x2, x4 (by looking at the 1st column of matrix A), then the resulted x1 = x0/2 + x2/2 + x4/4. Could you elaborate a bit more about the discrepancy? Sorry for my confusion.

jzhou316 avatar Feb 23 '22 17:02 jzhou316

the line x_j = torch.index_select(x_j, 0, edge_index[0]) just duplicates the node features into each edge, which is sizing the source feature matrix (N x C) into a edge feature matrix (E x C) in order for aggregation. For example, if edge0 = (x_1, x_3), edge1 = (x_0, x_4), edge2 = (x_1, x_8), ..., then the resized matrix would be [feature of x_1; feature of x_0; feature of x_1; ...]. There is no normalization in this step.

jzhou316 avatar Feb 23 '22 17:02 jzhou316