GraphSAGE icon indicating copy to clipboard operation
GraphSAGE copied to clipboard

about models.py

Open Achulei opened this issue 6 years ago • 3 comments

I find the tf.nn.fixed_unigram_candidate_sampler() function outputs negative samples including some positive samples.The outputs of negative sampling should not include any positive samples(included in true_classes). I am confused about this.
labels = [1,2,3]
batch_size = len(labels)
degs = np.array([3,2,3,4,2,1,1,5,5,4,6,2,1,7])
labels = tf.reshape(
    tf.cast(labels, dtype=tf.int64),
    [batch_size, 1])
neg_samples, _1, _2 = (tf.nn.fixed_unigram_candidate_sampler(
    true_classes=labels,
    num_true=1,
    num_sampled=4,
    unique=False,
    range_max=len(degs),
    distortion=0.75,
    unigrams=degs.tolist()))
with tf.Session() as sess:
     sess.run(tf.global_variables_initializer())
     print(sess.run(neg_samples))
Possible output:
1.[ 0  1  9 11]
2.[ 7  2 13  0]
3.[3 9 0 3]

Achulei avatar Mar 18 '19 12:03 Achulei

That could be true. We are relying on the assumption that there is only a small probability where random sampling negatives will sample a positive example, when the entire graph dataset is much larger than the neighborhood computation graph. So in general this does not cause too much issue, while tremendously improves efficiency.

You can check PinSAGE about better negative sampling schemes.

RexYing avatar May 28 '19 05:05 RexYing

Where to find PinSAGE? thanks.

anny0316 avatar Feb 11 '20 09:02 anny0316

The code for https://arxiv.org/pdf/1806.01973.pdf is not open-source due to corporate constraints. But the changes from GraphSAGE are described in paper. In the case of negative sampling, just need to implement a count-based PPR approximation.

RexYing avatar Feb 11 '20 19:02 RexYing