GraphSAGE
GraphSAGE copied to clipboard
about models.py
I find the tf.nn.fixed_unigram_candidate_sampler() function outputs negative samples including some positive samples.The outputs of negative sampling should not include any positive samples(included in true_classes). I am confused about this.
labels = [1,2,3]
batch_size = len(labels)
degs = np.array([3,2,3,4,2,1,1,5,5,4,6,2,1,7])
labels = tf.reshape(
tf.cast(labels, dtype=tf.int64),
[batch_size, 1])
neg_samples, _1, _2 = (tf.nn.fixed_unigram_candidate_sampler(
true_classes=labels,
num_true=1,
num_sampled=4,
unique=False,
range_max=len(degs),
distortion=0.75,
unigrams=degs.tolist()))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(neg_samples))
Possible output:
1.[ 0 1 9 11]
2.[ 7 2 13 0]
3.[3 9 0 3]
That could be true. We are relying on the assumption that there is only a small probability where random sampling negatives will sample a positive example, when the entire graph dataset is much larger than the neighborhood computation graph. So in general this does not cause too much issue, while tremendously improves efficiency.
You can check PinSAGE about better negative sampling schemes.
Where to find PinSAGE? thanks.
The code for https://arxiv.org/pdf/1806.01973.pdf is not open-source due to corporate constraints. But the changes from GraphSAGE are described in paper. In the case of negative sampling, just need to implement a count-based PPR approximation.