GraphZoom icon indicating copy to clipboard operation
GraphZoom copied to clipboard

Inquiry about the embedding space

Open HashWLS opened this issue 3 years ago • 13 comments

Dear Zhang,

I am now running the experiments from your code. I get the embedding representation and then compute l2 distance as the kernel for node classification and link prediction, but it is poor performance. Could you recommend the embedding space pls?

Thanks Wei

HashWLS avatar Nov 02 '21 02:11 HashWLS

Hi,

Thanks for your interest! Can you please elaborate more about your question? Are you using GraphZoom to obtain the node embeddings? Which dataset are you using? In addition, can you talk more about how do you compute l2 distance for node classification/link prediction?

Thanks, Chenhui

Chenhui1016 avatar Nov 02 '21 02:11 Chenhui1016

Hi,

Thanks for your quick reply. I embed yelp data from 'Accelerated Attributed NetworkEmbedding' by graphzoom with graphsage+lamg. After getting each node embedding, I will compute L2 distance pair by pair and generate the kernel matrix for node classification and link prediction.

Thanks Wei

ghost avatar Nov 02 '21 02:11 ghost

We have never used l2 distance and the kernel matrix that you mentioned for the label prediction. Have you tried "Logistic Regression" for label prediction after obtaining node embeddings? If the accuracy of using Logistic Regression is still low, can you also show me the table that lamg produces during coarsening?

Btw, You can also tune some hyperparameters in graphsage to improve the accuracy (e.g., max_total_steps, hidden dimension, and learning rate).

Chenhui1016 avatar Nov 02 '21 03:11 Chenhui1016

Hi,

As you suggested, I tried to classify the data, m10, via logistic regression and compute the micro_f1 and macro_f1. They are still low. Could you give me any suggestions pls? I have attach the table.

Thanks! Wei

graphzoom_m10

williamweiwu avatar Nov 17 '21 02:11 williamweiwu

Hi,

The table you showed is the fusion step. Can you please also show me the LAMG table during the coarsening phase? What is the reduction ratio you are using for coarsening? Btw, what's the f1 score you get with GraphZoom and what is the f1 score of baselines (e.g., GraphSAGE)?

Chenhui1016 avatar Nov 17 '21 02:11 Chenhui1016

Do you mean the walk in the attached picture? graphzoom_m10_v1 graphzoom_m10_v2

Reduction ratio is 2. graphzoom_m10_parameters

The results are as follows, compared with GraphSAGE graphzoom_m10_f1

Thanks very much! Wei

williamweiwu avatar Nov 17 '21 02:11 williamweiwu

Thanks for the information. For the GraphSAGE baseline, are you using the same code in GraphZoom by disabling fusion, coarsening, and refinement? If so, would you mind sharing this dataset with me to check?

Chenhui1016 avatar Nov 17 '21 03:11 Chenhui1016

No, the code of graphsage comes from stellargraph https://stellargraph.readthedocs.io/en/stable/

链接: https://pan.baidu.com/s/1mfk9KnYR6HtjzC0-JSKBXw 提取码: 2gb2

you can use to_npy.py to generate m10-feats.npy because it is large.

Note that, in m10 dataset, only the first 10310 nodes are labelled. Therefore, we conduct network embedding on the whole network including all the nodes, but classify the first 10310 labeled nodes.

Thanks very much.

Wei

williamweiwu avatar Nov 17 '21 03:11 williamweiwu

I see. In this case I think the main reason may be that the default hyperparameters of GraphSAGE in our code is not as good as the GraphSAGE code you used from stellargraph for the Yelp dataset. I would suggest you to integrate the GraphSAGE model from stellargraph (including the same hyperparameters) into GraphZoom and then evaluate the f1 score for a fair comparison. If this new GraphZoom+GraphSAGE still has low f1, I can help to run your dataset.

Chenhui1016 avatar Nov 17 '21 03:11 Chenhui1016

Could you give me the key parameters pls? I will try to adjust the key parameters in priority.

Thanks Wei

williamweiwu avatar Nov 18 '21 01:11 williamweiwu

I would suggest you to tune "max_total_steps", "learning rate", "neg_sample_size", and "hidden dimension". Btw, you are using the unsupervised version (instead of the supervised version) of GraphSAGE from stellargraph, right?

Chenhui1016 avatar Nov 18 '21 01:11 Chenhui1016

OMG, I used the supervised version of stellargraph. This is the key reason. But another learning-based algorithm, not GNN, can achieve over 0.7 f1 score, which still outperforms graphzoom.

Thanks for your great time and help. Wei

williamweiwu avatar Nov 18 '21 02:11 williamweiwu

Good to know you find the reason. Note that GraphZoom is just a framework and you can plug in whatever graph embedding model you want. If there is another learning-based algorithm outperforming our GraphZoom+GraphSAGE, then you can plug that algorithm into GraphZoom, which should also achieve higher accuracy.

Chenhui1016 avatar Nov 18 '21 04:11 Chenhui1016