decagon icon indicating copy to clipboard operation
decagon copied to clipboard

Can you supply the instructions about how to use real-world data to train model

Open Dinxin opened this issue 6 years ago • 18 comments

I viewed the whole code and found that the code only use toy dummy data to train model. So I don't really understand how you use those data to train GCN model. Can you supply the code or instructions about how to use real-world data to train model?

Dinxin avatar Aug 07 '18 06:08 Dinxin

  • It's also not clear how to get predictions from the trained model on new data/ a new pair of drugs. Do i put in SIDER codes? STITCH? other codes? in what format?

ddofer avatar Aug 21 '18 08:08 ddofer

I am also confused about how to apply the model to the real data.

colinwxl avatar Nov 02 '18 08:11 colinwxl

Please give instructions on how to apply the actual dataset in the code. It is very difficult to understand what the variables represent in the code for dummy data.

Msan1995 avatar Nov 04 '18 23:11 Msan1995

I am trying to apply the code to the real datasets. In the first step, I tried to check if I have the same parameters (number of proteins, drugs,...) for the network. The number of proteins as what has mentioned in the paper should be 19085. But, from the protein-protein network(bio-decagon-ppi), I get 19081 proteins. Has anyone tried applying the code to the real dataset? and have you got the same number of proteins for the network? Thanks.

vidarmehr avatar Nov 21 '18 20:11 vidarmehr

I am also confused about how to apply the model to the real data. Has anyone solved the problem? Thanks.

bbjy avatar Feb 26 '19 03:02 bbjy

Same problem for me, not quite sure how to apply that.

westzhicanchen avatar Mar 12 '19 18:03 westzhicanchen

@vidarmehr I also get 19081 proteins from the protein-protein network(bio-decagon-ppi), and 1317 side effects, not the same as mentioned in paper (1318). Is it the same with your parameters (number of proteins, drugs,...) ? Thanks.

bbjy avatar Apr 03 '19 14:04 bbjy

Any updates? Same issue here. We want to reproduce the paper's results.

chao1224 avatar Jun 30 '19 20:06 chao1224

@chao1224 I was not able to reproduce the results of paper and I decided to stop working on Decagon for now.

vidarmehr avatar Jul 01 '19 15:07 vidarmehr

@vidarmehr I also get 19081 proteins from the protein-protein network(bio-decagon-ppi), and 1317 side effects, not the same as mentioned in paper (1318). Is it the same with your parameters (number of proteins, drugs,...) ? Thanks. Sorry for my delay. I just saw your comment. As I mentioned, I am not working on Decagon anymore. Here is data that I got from the paper and from the real datasets: Number of proteins = 19,085 (paper) ....... Number of proteins = 19,081(ppi data) Number of drugs = 645 (paper).......... Number of drugs = 645 (polypharmacy side effect data (combo)) Number of protien-protien edges= 715,612(paper) ....... Number of protien-protien edges= 715,612 (ppi data) Number of drug-drug edges= 4,651,131 (paper) ......... Number of drug-drug edges= 4,649,441 (polypharmacy side effect data (combo)) Number of drug-protein edges= 18,596 (paper) ........ Number of drug-protein edges= 18,690 (Drug-target protein (targets))

vidarmehr avatar Jul 01 '19 19:07 vidarmehr

@vidarmehr I got it. Thank you so much for your reply.

bbjy avatar Jul 02 '19 01:07 bbjy

Thanks for the reply @vidarmehr.

Just want to quickly clarify a number:

  1. In bio-decagon-targets.csv, there are 18,690 interactions.
  2. In bio-decagon-targets-all.csv, there are 131,034 interactions, and 112,438 of them are invalid (not included in the STITCH list or Gene list). Therefore, there are 131,034 -112,438 = 18,596 valid interactions.

chao1224 avatar Jul 02 '19 04:07 chao1224

Are there any updates on this issue? I was also unable to reproduce the results in the paper. They say that they only focus on predicting the 964 polypharmacy side effects that each occurred in at least 500 drug combinations. However, the data they provide is the full TWOSIDES dataset. I don't know if they filter out some side effects in the code, but I couldn't find any evidence of this.

rubjim avatar Nov 12 '19 13:11 rubjim

@rubjim I only can get 963 side effect types which appear in more than 500 drug combinations. I think the decagon dataset is so confusing that we could not apply it in our research work.

Dinxin avatar Nov 26 '19 10:11 Dinxin

Was anyone ever able to reproduce the results? Or at least get it running properly?

chimkens avatar Dec 12 '20 19:12 chimkens

@Dinxin I agree with you, that's what I also get when I filter the side effects myself. However, they claim they predict for 964 which doesn't correspond to the actual numbers in the dataset. @christina-s-wang at least I wasn't able to do it.

rubjim avatar Dec 13 '20 12:12 rubjim

NO one cares for these people asking some help? I am in the same spot.

maryamag85 avatar Sep 02 '21 23:09 maryamag85

to use this code with real data + python 3.6 try this fork: https://github.com/DeepVivo/decagon

avi-pomicell avatar Dec 18 '22 08:12 avi-pomicell