prepare_dataset generalization

Open beabevi opened this issue 5 years ago • 0 comments

Hi, I was working with relational-gcn and I noticed that in prepare_dataset.py unnecessary rows of the adjacency matrices are zeroed to optimize the computation. However if the number of layers is greater than 2 the optimization would zero rows of nodes that are needed for the computation.

If I understood the code correctly, for the optimization to adjust based on the number of layers, the code should be updated as follows.

diff --git a/rgcn/prepare_dataset.py b/rgcn/prepare_dataset.py
index ef22e16..9c52c91 100644
--- a/rgcn/prepare_dataset.py
+++ b/rgcn/prepare_dataset.py
@@ -48,11 +48,12 @@ t = time.time()
 bfs_generator = bfs_relational(A, labeled_nodes_idx)
 lvls = list()
 lvls.append(set(labeled_nodes_idx))
-lvls.append(set.union(*bfs_generator.next()))
+for n in range(NUM_GC_LAYERS - 1):
+    lvls.append(set.union(*bfs_generator.next()))
 print("Done! Elapsed time " + str(time.time() - t))
 
 # Delete unnecessary rows in adjacencies for memory efficiency
-todel = list(set(range(num_nodes)) - set.union(lvls[0], lvls[1]))
+todel = list(set(range(num_nodes)) - set.union(*lvls))
 for i in range(len(A)):
     csr_zero_rows(A[i], todel)

Please let me know if this is correct or I am missing something.

Jan 26 '20 17:01 beabevi