GraphSAGE icon indicating copy to clipboard operation
GraphSAGE copied to clipboard

A question about the ppi dataset

Open DinikaSen opened this issue 5 years ago • 6 comments

I would like to know what is the exact dataset used to generate the toy_ppi dataset in example_data folder (input files used to generate the pre-processed dataset). What are the node features available in the dataset? May I know the link where the dataset is available?

DinikaSen avatar May 03 '19 10:05 DinikaSen

I think you can check https://downloads.thebiogrid.org/BioGRID to find source.

yashu88 avatar May 28 '19 04:05 yashu88

@DinikaSen did you get neccessary info from the URL mentioned by @yashu88 ?

preetham-salehundam avatar Jun 04 '19 14:06 preetham-salehundam

For raw data, http://snap.stanford.edu/ohmnet/ has the graph structure; http://software.broadinstitute.org/gsea/msigdb/collections.jsp has the feature and label information. c1, c3, c7 are the feature sets; GO is the label set.

RexYing avatar Jun 19 '19 21:06 RexYing

I have a question on the feature sets. As the dimensions of c1, c3, c7 are very large, how did you represent each protein using 50-dimensional vectors?

Thanks in advance!

zch42 avatar Jun 08 '20 03:06 zch42

Could you open your PPI data preprocessing code? @RexYing

knightXun avatar Sep 21 '20 09:09 knightXun

I have a question on the feature sets. As the dimensions of c1, c3, c7 are very large, how did you represent each protein using 50-dimensional vectors?

Thanks in advance!

I have the same question.

Sutongtong233 avatar Jul 14 '22 08:07 Sutongtong233