TDC
TDC copied to clipboard
bug in loading scPerturb datasets
Hi @kexinhuang12345, as you know ReplogleWeissman2022
study has three datasets.
Currently, as I understand ReplogleWeissman2022_K562_gwps
data is not uploaded. However, I noticed a weird behavior when I tried to load it! I had ReplogleWeissman2022_k562_essential
already downloaded in a path
folder and then I tried loading scperturb_gene_ReplogleWeissman2022_K562_gwps
and noticed it's saying Found local copy...
!
>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets')
Found local copy...
Loading...
Looking at the # of perturbations, it's not true for _gwps
dataset. It should be 9867 but it's 2058 (this is the same number as _essential
dataset)
>>> test_load.adata.obs.perturbation.unique()
Length: 2058
Looking more carefully, I tried an empty folder and noticed for some reason this is downloading wrong file for _gwps
.
>>> test_load = PerturbOutcome('scperturb_gene_ReplogleWeissman2022_K562_gwps','Datasets/new/')
Downloading...
█████████████████████████████████████████████| 1.55G/1.55G [01:09<00:00, 22.2MiB/s]
Loading...
~: ls Datasets/new/
scperturb_gene_ReplogleWeissman2022_k562_essential.h5ad
cc @amva13
Originally posted by @abearab in https://github.com/mims-harvard/TDC/issues/239#issuecomment-2082088585
@kexinhuang12345 – hi Kexin, I was wondering if you could check this issue. Thanks
Hi! Sorry for the delay - I think it is due to some name catching bugs, currently we do not have the gwps version uploaded to dataverse. Will fix it after the NeurIPS deadline!
Hi! Sorry for the delay - I think it is due to some name catching bugs, currently we do not have the gwps version uploaded to dataverse.
I see, that makes sense.
Will fix it after the NeurIPS deadline!
Thanks!