DLEPS icon indicating copy to clipboard operation
DLEPS copied to clipboard

Missing files in DLEPS/data

Open Nelhachem opened this issue 3 years ago • 16 comments

Hi, Thanks for providing this nice novel DL tool. However, there are some missing files and we appreciate if you could add those sometime soon.

  • /data/denseweight.h5
  • /data/benchmark.csv
  • /data/DLEPS_30000_tune_gvae10000.h5

Without the first 2 files, the DLEPS algorithm won't work in both Colab and Jupyter notebooks! Thanks

Nelhachem avatar Jul 13 '21 03:07 Nelhachem

We met the same problems.

ssq1993 avatar Aug 05 '21 09:08 ssq1993

It is quite puzzling, none of the authors replied to my email! It is a nice nature biotech publication; however we have the right to understand why some files are missing. If they are found on public sources, we appreciate you provide a link or a way to download the h5 file.

Nelhachem avatar Aug 05 '21 12:08 Nelhachem

Hi! Same issue here. The denseweight.h5 file corresponds to the weights for inferring the gene expression of the 12K genes from the 978 landmarks. I managed to find a file from the LINCS project that should correspond to these weights, but still there are the other 2 files missing and it would be safer if they provide also the denseweight.h5 or at least the link/source from where they took it.

GemaRG96 avatar Sep 02 '21 13:09 GemaRG96

Hi GemaRG96 I might have found the vae hdf5 https://github.com/mkusner/grammarVAE/blob/master/pretrained/zinc_vae_grammar_L56_E100_val.hdf5 Would you plz share the link to the denseweight.h5 file. it should be somewhere on the LINCS website... but the authors of the Nat Biotech paper did not add this info on github

Nelhachem avatar Sep 02 '21 19:09 Nelhachem

Hi! Here https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92742 inside the 'GSE92742_Broad_LINCS_auxiliary_datasets.tar.gz' file you will find the file 'DS_GEO_OLS_WEIGHTS_n979x21290.gctx' that contains the weights for the 21K inferred genes. The thing is that it is not an .h5 and it has more than the 12K genes they use in DLEPS, so in order to use it we should create the .h5 file and make several assumptions regarding the order of the genes, etc. Therefore, even having the weights I don't think we can use it without some guidance.

GemaRG96 avatar Sep 03 '21 07:09 GemaRG96

Hi! Yes, that make sense. An approach would be to test some ground truth signature and see if the h5 file from LINCS works as expected...until we have a reasonable answer from the authors

Nelhachem avatar Sep 03 '21 12:09 Nelhachem

@Nelhachem , I got no emal relies too. Don't know why they hide these files.

I think there's a updated weight file here: GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx.gz

Do you have any ideas what the benchmark.csv file is ? how to generate one?

zqfang avatar Sep 09 '21 09:09 zqfang

We meet the same problem. It's grateful if authors could reply our issue. This model could not be run at all unless the necessary documents are provided, including related input files, weights, and so on.

CompBioT avatar Sep 25 '21 06:09 CompBioT

We met the same problems.

wuys13 avatar Feb 05 '22 02:02 wuys13

We also met the the same ploblems, can't believe this problem hasn't been solved

jfckkiu avatar Apr 30 '22 08:04 jfckkiu

@Nelhachem , I got no emal relies too. Don't know why they hide these files.

I think there's a updated weight file here: GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx.gz

Do you have any ideas what the benchmark.csv file is ? how to generate one?

how did you process this file?

joey0214 avatar Nov 02 '22 11:11 joey0214

Excuse me, can you use this model?

tqinger avatar Nov 21 '22 07:11 tqinger

Excuse me, can you use this model?

nope, "denseweight.h5" file is missing.

joey0214 avatar Nov 21 '22 09:11 joey0214

The "denseweight.h5" link is https://kaggle.com/datasets/b0a096e3c550146f2a786f0ffd3c8bd37d68b04c7b09697efd282f91f8f6e36f,was it recently updated by the author ? But I also want to know where is the "benchmark.csv" file. I hope get authors' guidence. Any body try the script.

dreamfly999 avatar Dec 08 '22 08:12 dreamfly999

Cool paper, and benchmark.csv (the average expression levels for the 978 genes) is utterly needed to calculate enrichment.

ACDBio avatar May 03 '23 15:05 ACDBio

Hi, from where can we get the benchmark.csv file

muralikrishnasn avatar Jul 26 '23 10:07 muralikrishnasn