gae icon indicating copy to clipboard operation
gae copied to clipboard

A question about negative samples generation in preprocessing.py

Open huxiaoti opened this issue 3 years ago • 4 comments

Hi Thomas,

I'm confused when you generate the negative edge labels of validation set as:

val_edges_false = []
    while len(val_edges_false) < len(val_edges):
        idx_i = np.random.randint(0, adj.shape[0])
        idx_j = np.random.randint(0, adj.shape[0])
        if idx_i == idx_j:
            continue
        if ismember([idx_i, idx_j], train_edges):
            continue
        if ismember([idx_j, idx_i], train_edges):
            continue
        if ismember([idx_i, idx_j], val_edges):
            continue
        if ismember([idx_j, idx_i], val_edges):
            continue
        if val_edges_false:
            if ismember([idx_j, idx_i], np.array(val_edges_false)):
                continue
            if ismember([idx_i, idx_j], np.array(val_edges_false)):
                continue
        val_edges_false.append([idx_i, idx_j])

However, the test negative set is confirmed by

if ismember([idx_i, idx_j], edges_all):
           continue

Why does validation set use ismember([idx_j, idx_i], train_edges) and ismember([idx_i, idx_j], val_edges) instead of ismember([idx_i, idx_j], edges_all)?

Wu Shiauthie

huxiaoti avatar Mar 24 '21 12:03 huxiaoti

Hi, I had the same issue.

I gave it some thought, and I realized that the negative validation/training samples should be able to sample from the test's samples, otherwise the algorithm would have an edge over the test samples.

In other words, edges in the test set can be sampled as negative examples in the validation/training sets (this could happen in a real world scenario).

So, this explain why ismember is segregated in train_edges and val_edges. However, there is this line:

assert ~ismember(val_edges_false, edges_all)

Which I don't understand the purpose of.

gonzalesMK avatar Oct 19 '21 15:10 gonzalesMK

I understand why assert error appears sometimes when running the program. This is because val_edge_false may appear in edges_all.

File "train.py", line 47, in <module>
    adj_train, train_edges, val_edges, val_edges_false, test_edges, test_edges_false = mask_test_edges(adj)
  File "/home/lf/work/gae/gae/preprocessing.py", line 100, in mask_test_edges
    assert ~ismember(val_edges_false, edges_all)
AssertionError

lif323 avatar Dec 14 '21 03:12 lif323

Hi, I think a program without assert error, that is, the correct code, is equivalent to the following code:

val_edges_false = []
    while len(val_edges_false) < len(val_edges):
        idx_i = np.random.randint(0, adj.shape[0])
        idx_j = np.random.randint(0, adj.shape[0])
        if idx_i == idx_j:
            continue
        if ismember([idx_j, idx_i], edges_all):
            continue
        if val_edges_false:
            if ismember([idx_j, idx_i], np.array(val_edges_false)):
                continue
            if ismember([idx_i, idx_j], np.array(val_edges_false)):
                continue
        val_edges_false.append([idx_i, idx_j])

lif323 avatar Dec 14 '21 03:12 lif323

Hello, I am having the same issue. assert ~ismember(val_edges_false, edges_all) AssertionError Did anyone find the solution? Kindly help.

sheenahora avatar Feb 10 '22 17:02 sheenahora