dgl icon indicating copy to clipboard operation
dgl copied to clipboard

ogbn-arxiv text features don't match

Open devinbost opened this issue 10 months ago • 2 comments

I noticed that the node abstracts in https://snap.stanford.edu/ogb/data/misc/ogbn_arxiv/titleabs.tsv.gz almost match in quantity (179,719) to the nodes obtained from the DGL graph (DglNodePropPredDataset(name='ogbn-arxiv')) (169,343). It's not clear yet to me why there is a discrepancy. However, it makes it difficult for me to map the text features of the nodes to the nodes in the DGL graph. Any explanation would be helpful.

devinbost avatar Apr 04 '24 12:04 devinbost

Issue has been addressed in https://github.com/snap-stanford/ogb/issues/222. The reason is that not all the listed the node abstracts file are mapped into the graph, such as paper id 200971, with title "ontology as a source for rule generation". The real number of node can be found in the "nodeidx2paperid.csv" file. And this file is in arxiv/mapping folder after unzip.

TristonNV avatar Apr 08 '24 22:04 TristonNV

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

github-actions[bot] avatar May 09 '24 01:05 github-actions[bot]