gae
gae copied to clipboard
Better mask_test_edges function
@tkipf This is my fix for a little bit more performant mask_test_edges function. I managed to reproduce the results of the paper and it works much better even with large graphs. The only time consuming processes in this function are the assertions at the end of the function.
Thanks -- I'll leave this open for now in case someone is interested in a (working) example for a more efficient implementation. I rolled the master branch back to the original version before @philipjackson's PR to keep it in line with the paper.
Dear all,
Contrary to previous comments (here + #54), I was able to reproduce all results from @tkipf's original paper using @philipjackson's implementation (see #25) of the mask_test_edges
function.
I suspect that previous issues simply come from different train/validation/test splits. Indeed, @philipjackson set default parameters values to test_percent=30.
and val_percent=20.
, whereas @tkipf used test_percent=10.
and val_percent=5.
in his experiments. So, @philipjackson masks more edges from the training graph w.r.t. original experiments, which leads to lower performances. With corrected parameters, I reach consistent results w.r.t. the paper.
Moreover, for the PubMed dataset, i.e. the largest one, @stefanosantaris's implementation runs in 3+ minutes on my laptop. @philipjackson's implementation runs in 0.03 seconds, and in a few seconds for a graph with 1 million nodes (I removed all assert
lines for both functions).
As a consequence, I would recommend to use #25 with updated default parameters. :)
Dear all,
Contrary to previous comments (here + #54), I was able to reproduce all results from @tkipf's original paper using @philipjackson's implementation (see #25) of the
mask_test_edges
function.I suspect that previous issues simply come from different train/validation/test splits. Indeed, @philipjackson set default parameters values to
test_percent=30.
andval_percent=20.
, whereas @tkipf usedtest_percent=10.
andval_percent=5.
in his experiments. So, @philipjackson masks more edges from the training graph w.r.t. original experiments, which leads to lower performances. With corrected parameters, I reach consistent results w.r.t. the paper.Moreover, for the PubMed dataset, i.e. the largest one, @stefanosantaris's implementation runs in 3+ minutes on my laptop. @philipjackson's implementation runs in 0.03 seconds, and in a few seconds for a graph with 1 million nodes (I removed all
assert
lines for both functions).As a consequence, I would recommend to use #25 with updated default parameters. :)
Hi, @GuillaumeSalha . Can you reproduce the results in the paper with test_percent=10, val_percent=5 with the original implementation /updated implementation? I still cannot reproduce it with the original one. Sad...
Hi @haorannlp !
After changing test_percent=10.
and val_percent=5.
in preprocessing.py
, you need to run again:
cd ..
python setup.py install
Then, indeed, you should be able to reproduce results from the paper.
Hi everyone,
I think what happened here is that I wrote this code along with @sbonner0 for use in a paper of our own, in which we used different sized val and test splits, and only submitted it as a pull request here as an afterthought. That's why my default val_percent and test_percent don't match up with @tkipf's originals, I didn't think to revert them when I made the pull request. Apologies for the inconvenience caused, and thanks to @GuillaumeSalha for spotting the issue!
Thank you buddy, @GuillaumeSalha! I can reproduce the results now.