GraphSAGE icon indicating copy to clipboard operation
GraphSAGE copied to clipboard

Data Train/Validation Separation for fully Unsupervised Training

Open PJthunder opened this issue 5 years ago • 8 comments

Hi there, Thank you for your implementation of this paper. I have a question of the data preparation. If I have a network without any feature nodes, just want to get unsupervised network embedding of these nodes (like in node2vec). What kind of data I need to prepare for GraphSage. Previously, I randomly select 20% of nodes as validation and 80% as the training. I only include all the edges between training data points and all the edges between test data points in the files. But the embedding result is pretty weird. The validation loss is not changed over time. What exactly I need to do for the next step? Randomly drop some edges? Look forward to your help!

PJthunder avatar Jun 03 '19 17:06 PJthunder

You will need the features for all nodes as well. When splitting train/test, you can split the edges, not nodes. So 20% edges as test; 80% edges as training.

RexYing avatar Jun 04 '19 19:06 RexYing

Thanks for the reply! So I can set an identical feature set for all the nodes, right? Also, if I separate the edges to be 20%/80%. What should we set to the nodes that appear in both training/test edges? We should set different values for test_removed and train_removed in node attribute of nodes?

PJthunder avatar Jun 05 '19 00:06 PJthunder

Hi there, Thank you for your implementation of this paper. I have a question of the data preparation. If I have a network without any feature nodes, just want to get unsupervised network embedding of these nodes (like in node2vec). What kind of data I need to prepare for GraphSage. Previously, I randomly select 20% of nodes as validation and 80% as the training. I only include all the edges between training data points and all the edges between test data points in the files. But the embedding result is pretty weird. The validation loss is not changed over time. What exactly I need to do for the next step? Randomly drop some edges? Look forward to your help!

I have met the same problem as you. The validation loss is not changed over time when separating 90% nodes as training and 10% as validation for unsupervised training.

ciphoo avatar Jul 09 '19 02:07 ciphoo

I dont understand why should we split train and test set in unsupervised mode?

beyondguo avatar Jul 25 '19 15:07 beyondguo

I think for unsupervised learning, we only need to split edges to training and validation sets.

skx300 avatar Aug 29 '19 09:08 skx300

Hi there, Thank you for your implementation of this paper. I have a question of the data preparation. If I have a network without any feature nodes, just want to get unsupervised network embedding of these nodes (like in node2vec). What kind of data I need to prepare for GraphSage. Previously, I randomly select 20% of nodes as validation and 80% as the training. I only include all the edges between training data points and all the edges between test data points in the files. But the embedding result is pretty weird. The validation loss is not changed over time. What exactly I need to do for the next step? Randomly drop some edges? Look forward to your help!

The embedding is randomly initialized?

luomuqinghan avatar Mar 06 '20 09:03 luomuqinghan

Hi there, Thank you for your implementation of this paper. I have a question of the data preparation. If I have a network without any feature nodes, just want to get unsupervised network embedding of these nodes (like in node2vec). What kind of data I need to prepare for GraphSage. Previously, I randomly select 20% of nodes as validation and 80% as the training. I only include all the edges between training data points and all the edges between test data points in the files. But the embedding result is pretty weird. The validation loss is not changed over time. What exactly I need to do for the next step? Randomly drop some edges? Look forward to your help!

I have met the same problem as you. The validation loss is not changed over time when separating 90% nodes as training and 10% as validation for unsupervised training.

The embedding is randomly initialized? I have the same problem. Have you solved it?

luomuqinghan avatar Mar 06 '20 09:03 luomuqinghan

Thanks for the reply! So I can set an identical feature set for all the nodes, right? Also, if I separate the edges to be 20%/80%. What should we set to the nodes that appear in both training/test edges? We should set different values for test_removed and train_removed in node attribute of nodes?

Hi @PJthunder , Can you help me understand what is train_removed in the dataset? Trying to train graphsage for my own dataset and getting error at construct_adj in supervised code. Any help would be appreciated. Thanks!

KeyError: '1228675212801200130' KeyError: '1228675212801200130' File "/home/ubuntu/GraphSAGE/graphsage/minibatch.py", line 237, in construct_adj

NidhiSultan avatar Aug 31 '20 08:08 NidhiSultan