CITE-seq-Count icon indicating copy to clipboard operation
CITE-seq-Count copied to clipboard

Discussion question -- is it normal to take so long?

Open dagarfield opened this issue 1 year ago • 3 comments

So I am abusing this tool a bit to look not at CITE-seq barcodes, but at a TRACE-seq barcoding experiment in which my specific experiment has (empirically) about 66,000 barcodes. The result is....slower than expected. Any thoughts? As you can see, the current pace isn't really scalable....

(this worked great with our pilot, but it was many, many fewer cells -- here I've cranked it up to include the expected ~11k cells plus more for ambient correction/estimation)

% CITE-seq-Count -R1 $read1 -R2 $read2 -t output.csv -cbf 1 -cbl 16 -umif 17 -umil 28 -cells 30000 -trim 25 -o cite_out --threads 7
Counting number of reads
Started mapping
Processing 66,484,476 reads
CITE-seq-Count is running with 7 cores.
Processed 1,000,000 reads in 10.0 hours, 12.0 minutes, 4.962 seconds. Total reads: 1,000,000 in child 26029
Processed 1,000,000 reads in 10.0 hours, 14.0 minutes, 47.28 seconds. Total reads: 1,000,000 in child 26030
Processed 1,000,000 reads in 10.0 hours, 16.0 minutes, 14.01 seconds. Total reads: 1,000,000 in child 26033
Processed 1,000,000 reads in 10.0 hours, 17.0 minutes, 18.27 seconds. Total reads: 1,000,000 in child 26035
Processed 1,000,000 reads in 10.0 hours, 17.0 minutes, 48.96 seconds. Total reads: 1,000,000 in child 26032
Processed 1,000,000 reads in 10.0 hours, 18.0 minutes, 18.83 seconds. Total reads: 1,000,000 in child 26031
Processed 1,000,000 reads in 10.0 hours, 20.0 minutes, 44.28 seconds. Total reads: 1,000,000 in child 26034
Processed 1,000,000 reads in 10.0 hours, 9.0 minutes, 53.68 seconds. Total reads: 2,000,000 in child 26029
Processed 1,000,000 reads in 10.0 hours, 10.0 minutes, 52.67 seconds. Total reads: 2,000,000 in child 26030
Processed 1,000,000 reads in 10.0 hours, 14.0 minutes, 17.73 seconds. Total reads: 2,000,000 in child 26033
Processed 1,000,000 reads in 10.0 hours, 13.0 minutes, 21.22 seconds. Total reads: 2,000,000 in child 26032
Processed 1,000,000 reads in 10.0 hours, 13.0 minutes, 53.29 seconds. Total reads: 2,000,000 in child 26035
Processed 1,000,000 reads in 10.0 hours, 13.0 minutes, 51.79 seconds. Total reads: 2,000,000 in child 26031
Processed 1,000,000 reads in 10.0 hours, 15.0 minutes, 43.0 seconds. Total reads: 2,000,000 in child 26034

dagarfield avatar Apr 11 '23 18:04 dagarfield

I should probably mention that my version is conda installed (https://anaconda.org/bioconda/cite-seq-count) so v1.4.4 I think it is. The python version is 3.7.12 (as installed by mamba/conda)

dagarfield avatar Apr 11 '23 18:04 dagarfield

And there are 64k tags in that -t file...which I am starting to think is the essential issue here.

dagarfield avatar Apr 12 '23 02:04 dagarfield

Hello @dagarfield, I'm guessing this would be too heavy. Have you tried to run it without cell barcode and UMI correction? This software was not built for big datasets like this one I'm afraid.

Hoohm avatar Jul 22 '23 09:07 Hoohm