dgl
dgl copied to clipboard
ogbn-arxiv text features don't match
I noticed that the node abstracts in https://snap.stanford.edu/ogb/data/misc/ogbn_arxiv/titleabs.tsv.gz almost match in quantity (179,719) to the nodes obtained from the DGL graph (DglNodePropPredDataset(name='ogbn-arxiv')) (169,343). It's not clear yet to me why there is a discrepancy. However, it makes it difficult for me to map the text features of the nodes to the nodes in the DGL graph. Any explanation would be helpful.
Issue has been addressed in https://github.com/snap-stanford/ogb/issues/222. The reason is that not all the listed the node abstracts file are mapped into the graph, such as paper id 200971, with title "ontology as a source for rule generation". The real number of node can be found in the "nodeidx2paperid.csv" file. And this file is in arxiv/mapping folder after unzip.
This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you