cellsnp-lite
cellsnp-lite copied to clipboard
runtime and expected output
Hi, I ran cellsnp-lite with one bam file. It is now already running since 3 weeks. I was wondering if this is to be expected. Is there somewhere any information on the expected output files? There are already several files in my output folder and I am not sure if the program is actually already finished and it just appears as still running. Best, Anne
Hi, the files in the output folder should probably be the temporary files (with suffix such as .0
, .1
, ... etc). When the program finishes, the output folder should look like this example.
How many cells does the bam file contain? Sometimes it could take a long time for cellsnp-lite to genotype a big dataset, especially in mode 2a (i.e., to pileup whole chromosomes for 10x data). Could you also share your command line and the version of cellsnp-lite?
Best, Xianjie
Hi, the bam file should contain about 15,600 cells. I used mode 1a with the region vcf from here. I'm running this on a cluster with cellsnp-lite version 1.2.2. I'm not exactly sure what you mean with "share your command line".
Best, Anne
The command line contains all the parameters you used to run cellsnp-lite, e.g.,
cellsnp-lite -s $BAM -b $BARCODE -O $OUT_DIR -R $REGION_VCF -p 20 --minMAF 0.1 --minCOUNT 20 --gzip
.
The 15,600 cells indicate the bam is probably a big 10x dataset. To speedup, you may
- check whether the cell barcodes are "filtered", i.e., from
filtered_gene_bc_matrices
instead of fromraw_gene_bc_matrices
in the cellranger output folder (update -b); - try to use the SNP list from
AF5e2
VCF file instead ofAF5e4
in this folder (update -R); - use more threads or cores (update -p).
Ah ok, the command I used looks like this:
cellsnp-lite -s path/to/possorted_genome_bam.bam -b /path/to/raw_feature_bc_matrix/barcodes.tsv.gz -O /vireo/test -R /vireo/genome1K.phase3.SNP_AF5e2.chr1toX.hg38.vcf.gz -p 20 --minMAF 0.1 --minCOUNT 20 --gzip
Yes indeed, it is a 10x dataset and I used the raw barcodes. I will try out your suggestions, thank you very much for your help!
For 15k cells, roughly how long did this take to run?
hello, i ran cellsnp-lite, and it failed with"[E::idx_find_and_load] Could not retrieve index file for '/h/sunnan/VKHDATA/vkha/gex_possorted_bam.bam'", what should i do next? thank you very much!
hello, i ran cellsnp-lite, and it failed with"[E::idx_find_and_load] Could not retrieve index file for '/h/sunnan/VKHDATA/vkha/gex_possorted_bam.bam'", what should i do next? thank you very much!
you'll need to re-index your bam file:
samtools index /h/sunnan/VKHDATA/vkha/gex_possorted_bam.bam