pySCENIC
pySCENIC copied to clipboard
[BUG] early termination of pyscenic ctx with no error message
I've been seeing my pyscenic ctx gets terminated in the middle of the run. My mystery is that it doesn't throw any error message as shown below:
$ singularity shell -B path/to/working/directory aertslab-pyscenic-0.12.1.sif
singularity> bash pyscenic_23.sh
# running..
# running..
# running..
2024-02-09 12:16:56,692 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): All regulons derived.
2024-02-09 12:16:56,692 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): All regulons derived.
2024-02-09 12:16:56,696 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): Done.
2024-02-09 12:16:56,696 - pyscenic.prune - INFO - Worker mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings(1): Done.
Singularity>
It comes back to my terminal without the Writing an output file message, which I'm expecting to see if it was successful.
My script is following:
$ cat pyscenic_23.sh
#!/bin/bash
pyscenic ctx \
scrnaseq-pyscenic-tac1-chat/WT_all_adj.csv \
cistarget-db/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
cistarget-db/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
--annotations_fname cistarget-db/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl \
--expression_mtx_fname scrnaseq-pyscenic-tac1-chat/WT_all.loom \
-o scrnaseq-pyscenic-tac1-chat/WT_all_reg.csv \
--mask_dropouts \
--num_workers 32 \
This run ended up creating an empty (zero byte) output file shown here:
$ la | grep WT_all_reg.csv
-rw-rw----. 1 username xxxx 0 Feb 9 15:47 WT_all_reg.csv
I'm assuming that my issue is unrelated to unmatched gene symbols across the input files as I found that over 1000 gene symbols intersected across the ranking (.feather), adjacency (adj.csv), TFs (allTFs_mm.txt), and motif (.tbl) files.
# Load ranking databases
feather = f_db_names['raw'][0]
feather = pd.read_feather(feather)
# Unique gene symbols
feather_g = list(set(list(feather.columns))) # from feather file
tfs_g = list(pd.read_csv(tfs, header=None).iloc[:,0]) # from TF list
adj_g = list(set(pd.read_csv(adj_csv).loc[:, 'target'].to_list())) # from adjacency
anno_g = list(set(pd.read_csv(f_motif_path, sep="\t").loc[:,'gene_name']))
# All gene symbols (w duplicates)
all_g = tfs_g + adj_g + feather_g + anno_g
# Retrieve gene symbols intersecting across the feather, tfs, adj, and tbl files
count = pd.Series(all_g).value_counts().to_dict()
# Save gene symbols found in every file to a list
g4 = []
for key, value in count.items():
if value == 4:
g4.append(key)
# >>> len(tfs_g)
# 1860
# >>> len(adj_g)
# 20746
# >>> len(feather_g)
# 24069
# >>> len(anno_g)
# 1412
# >>> g4[:6]
# ['Cpeb1', 'Pml', 'Rcor1', 'Rad21', 'Ascc1', 'Prox1']
# >>> len(g4)
# 1054
assert len(g4) > 0, "Ensure to have gene symbols matched across all input files."
Some testings done so far:
- skipping
--no_pruningparameter - skipping
--mask_dropoutsparameter - using v9 ranking files for mm10 (
mm10__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather,mm10__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.genes_vs_motifs.rankings.feather) and/or v9 motif file (motifs-v9-nr.mgi-m0.001-o0.0.tbl)- v9 ranking file + v10 motif file
- v10 ranking file + v9 motif file
- v9 ranking file + v9 motif file
I saw a few issue reports on empty output but at least they got Writing an output file message.
My environment was HPC being summarized here:
- OS
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description: Red Hat Enterprise Linux release 8.9 (Ootpa)
Release: 8.9
Codename: Ootpa
- pySCENIC version:
Singularity> pyscenic -h
usage: pyscenic [-h] {grn,add_cor,ctx,aucell} ...
Single-Cell rEgulatory Network Inference and Clustering (0.12.1+0.gce41b61.dirty)
- Installation method:
singularity build aertslab-pyscenic-0.12.1.sif docker://aertslab/pyscenic:0.12.1 - Run environment: CLI in singularity
- Package versions:
singularity-ce version 4.0.1
I think my CLI code is relatively straightforward. I've been also trying with R and python but none of them are successful. I would like to get some hints or suggestions. Thank you very much for your time!
How much memory did you assign to the job? Try lowering the number of workers.
Interesting. I got it worked out after changing the number of cpus from 32 to 8. (memory was 100g)
Singularity> cat pyscenic_ctx_test.sh
#!/bin/bash
pyscenic ctx \
scrnaseq-pyscenic-tac1-chat/WT_all_adj.csv \
cistarget-db/mm10_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
cistarget-db/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
--annotations_fname cistarget-db/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl \
--expression_mtx_fname scrnaseq-pyscenic-tac1-chat/WT_all.loom \
-o scrnaseq-pyscenic-tac1-chat/WT_all_reg.csv \
--mask_dropouts \
--num_workers 8
ingularity> bash pyscenic_ctx_test.sh
(running...)
2024-02-14 15:44:04,062 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): All regulons derived.
2024-02-14 15:44:04,062 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): All regulons derived.
2024-02-14 15:44:04,081 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): Done.
2024-02-14 15:44:04,081 - pyscenic.prune - INFO - Worker mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings(3): Done.
2024-02-14 15:44:04,147 - pyscenic.cli.pyscenic - INFO - Writing results to file.
$ la | grep WT_all_reg.csv
-rw-rw----. 1 username xxxx 2.5M Feb 14 15:44 WT_all_reg.csv
I've never thought about more cpus causing problems. Do you have any guess about this situation?
Thank you so much for the discussion, @ghuls!
Each worker loads the databases again, so you need memory for this.