graphsage-simple
graphsage-simple copied to clipboard
Dataset seperation
Can anyone explain to me the logic behind train/valid/test node separation of this code? For the cora dataset out of shuffled 2708 nodes, first 1000 is taken as test nodes, next 500 as valid nodes and the rest as train nodes. Similarly for the pubmed dataset, out of shuffled 19717 nodes, first 1000 is taken as test nodes. next 500 as valid nodes and rest as the train nodes. So, test:valid:train proportion of cora is 36.9 : 18.5 : 44.6, pubmed is 5.1 : 2.5 : 92.4.
-
Don't we have to keep the same ratio between test:valid:train nodes?
-
How can I seperate a new dataset to these categories?
I belive that we need to seperate nodes into train/valid/test categories for a node classification problem. What about the link prediction problem?