CITE-seq-Count
CITE-seq-Count copied to clipboard
Script won't complete
Hello, I have been trying to use the CITE-seq-Count script, but cannot get it to complete. It keeps getting hung up at the "Looking for a Whitelist" process. Any ideas what could be causing this? I have attached my logs below.
/nethome/cndd3/miniconda3/bin/CITE-seq-Count -R1 Multi_hybrid_R1.fq.gz -R2 Multi_hybrid_R2.fq.gz -t invivo_cite.csv -cbf 1 -cbl 48 -umif 50 -umil 57 -cells 11000 -o /cite_seq Matplotlib created a temporary config/cache directory at /tmp/matplotlib-kbkieitb because the default path (/home/barcode/.config/matplotlib) is not a writable directory; it is highly re commended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. [WARNING] Read1 length is 150bp but you are using 56bp for Cell and UMI barcodes combined. This might lead to wrong cell attribution and skewed umi counts.
Counting number of reads
Started mapping
Processing 488,884,653 reads
CITE-seq-Count is running with 64 cores.
Processed 1,000,000 reads in 22.55 seconds. Total reads: 1,000,000 in child 98148
Processed 1,000,000 reads in 22.48 seconds. Total reads: 2,000,000 in child 98148
Processed 1,000,000 reads in 22.61 seconds. Total reads: 3,000,000 in child 98148
Processed 1,000,000 reads in 1.0 minute, 20.64 seconds. Total reads: 1,000,000 in child 98149
Processed 1,000,000 reads in 22.5 seconds. Total reads: 4,000,000 in child 98148
Processed 1,000,000 reads in 22.29 seconds. Total reads: 2,000,000 in child 98149
Processed 1,000,000 reads in 21.93 seconds. Total reads: 5,000,000 in child 98148
Processed 1,000,000 reads in 22.36 seconds. Total reads: 3,000,000 in child 98149
Processed 1,000,000 reads in 22.63 seconds. Total reads: 6,000,000 in child 98148
Processed 1,000,000 reads in 2.0 minutes, 16.75 seconds. Total reads: 1,000,000 in child 98150
Processed 1,000,000 reads in 21.86 seconds. Total reads: 4,000,000 in child 98149
Processed 1,000,000 reads in 22.27 seconds. Total reads: 7,000,000 in child 98148
Processed 1,000,000 reads in 22.22 seconds. Total reads: 2,000,000 in child 98150
Processed 1,000,000 reads in 22.33 seconds. Total reads: 5,000,000 in child 98149
Mapping done for process 98148. Processed 7,638,822 reads
Processed 1,000,000 reads in 23.53 seconds. Total reads: 3,000,000 in child 98150
Processed 1,000,000 reads in 24.98 seconds. Total reads: 6,000,000 in child 98149
Processed 1,000,000 reads in 3.0 minutes, 17.18 seconds. Total reads: 1,000,000 in child 98151
Processed 1,000,000 reads in 25.44 seconds. Total reads: 4,000,000 in child 98150
Processed 1,000,000 reads in 27.64 seconds. Total reads: 7,000,000 in child 98149
Processed 1,000,000 reads in 25.57 seconds. Total reads: 2,000,000 in child 98151
Processed 1,000,000 reads in 24.58 seconds. Total reads: 5,000,000 in child 98150
Mapping done for process 98149. Processed 7,638,822 reads
Processed 1,000,000 reads in 26.42 seconds. Total reads: 3,000,000 in child 98151
Processed 1,000,000 reads in 24.31 seconds. Total reads: 6,000,000 in child 98150
Processed 1,000,000 reads in 4.0 minutes, 18.95 seconds. Total reads: 1,000,000 in child 98152
Processed 1,000,000 reads in 26.36 seconds. Total reads: 4,000,000 in child 98151
Processed 1,000,000 reads in 23.74 seconds. Total reads: 7,000,000 in child 98150
Processed 1,000,000 reads in 24.35 seconds. Total reads: 2,000,000 in child 98152
Mapping done for process 98150. Processed 7,638,822 reads
Processed 1,000,000 reads in 23.96 seconds. Total reads: 5,000,000 in child 98151
Processed 1,000,000 reads in 25.97 seconds. Total reads: 3,000,000 in child 98152
Processed 1,000,000 reads in 29.64 seconds. Total reads: 5,000,000 in child 98205
Processed 1,000,000 reads in 31.02 seconds. Total reads: 3,000,000 in child 98206
Processed 1,000,000 reads in 1.0 hour, 8.0 minutes, 51.7 seconds. Total reads: 1,000,000 in child 98207
Processed 1,000,000 reads in 31.01 seconds. Total reads: 6,000,000 in child 98205
Processed 1,000,000 reads in 30.5 seconds. Total reads: 4,000,000 in child 98206
Processed 1,000,000 reads in 28.49 seconds. Total reads: 2,000,000 in child 98207
Processed 1,000,000 reads in 31.46 seconds. Total reads: 7,000,000 in child 98205
Processed 1,000,000 reads in 30.3 seconds. Total reads: 5,000,000 in child 98206
Mapping done for process 98205. Processed 7,638,822 reads
Processed 1,000,000 reads in 27.55 seconds. Total reads: 3,000,000 in child 98207
Processed 1,000,000 reads in 1.0 hour, 10.0 minutes, 1.908 seconds. Total reads: 1,000,000 in child 98208
Processed 1,000,000 reads in 30.68 seconds. Total reads: 6,000,000 in child 98206
Processed 1,000,000 reads in 1.0 hour, 10.0 minutes, 15.98 seconds. Total reads: 1,000,000 in child 98209
Processed 1,000,000 reads in 29.15 seconds. Total reads: 4,000,000 in child 98207
Processed 1,000,000 reads in 29.11 seconds. Total reads: 2,000,000 in child 98208
Processed 1,000,000 reads in 31.77 seconds. Total reads: 7,000,000 in child 98206
Processed 1,000,000 reads in 29.44 seconds. Total reads: 2,000,000 in child 98209
Processed 1,000,000 reads in 28.81 seconds. Total reads: 5,000,000 in child 98207
Processed 1,000,000 reads in 27.64 seconds. Total reads: 3,000,000 in child 98208
Mapping done for process 98206. Processed 7,638,822 reads
Processed 1,000,000 reads in 26.27 seconds. Total reads: 6,000,000 in child 98207
Processed 1,000,000 reads in 31.28 seconds. Total reads: 3,000,000 in child 98209
Processed 1,000,000 reads in 30.15 seconds. Total reads: 4,000,000 in child 98208
Processed 1,000,000 reads in 30.85 seconds. Total reads: 7,000,000 in child 98207
Processed 1,000,000 reads in 32.54 seconds. Total reads: 4,000,000 in child 98209
Processed 1,000,000 reads in 24.25 seconds. Total reads: 5,000,000 in child 98208
Mapping done for process 98207. Processed 7,638,822 reads
Processed 1,000,000 reads in 1.0 hour, 12.0 minutes, 8.088 seconds. Total reads: 1,000,000 in child 98210
Processed 1,000,000 reads in 26.06 seconds. Total reads: 5,000,000 in child 98209
Processed 1,000,000 reads in 29.38 seconds. Total reads: 6,000,000 in child 98208
Processed 1,000,000 reads in 29.26 seconds. Total reads: 2,000,000 in child 98210
Processed 1,000,000 reads in 28.84 seconds. Total reads: 6,000,000 in child 98209
Processed 1,000,000 reads in 32.72 seconds. Total reads: 7,000,000 in child 98208
Processed 1,000,000 reads in 28.32 seconds. Total reads: 3,000,000 in child 98210
Mapping done for process 98208. Processed 7,638,822 reads
Processed 1,000,000 reads in 29.1 seconds. Total reads: 7,000,000 in child 98209
Processed 1,000,000 reads in 1.0 hour, 13.0 minutes, 26.72 seconds. Total reads: 1,000,000 in child 98211
Mapping done for process 98209. Processed 7,638,822 reads
Processed 1,000,000 reads in 27.73 seconds. Total reads: 4,000,000 in child 98210
Processed 1,000,000 reads in 29.42 seconds. Total reads: 2,000,000 in child 98211
Processed 1,000,000 reads in 26.99 seconds. Total reads: 5,000,000 in child 98210
Processed 1,000,000 reads in 27.07 seconds. Total reads: 3,000,000 in child 98211
Processed 1,000,000 reads in 25.64 seconds. Total reads: 6,000,000 in child 98210
Processed 1,000,000 reads in 27.17 seconds. Total reads: 4,000,000 in child 98211
Processed 1,000,000 reads in 26.57 seconds. Total reads: 7,000,000 in child 98210
Mapping done for process 98210. Processed 7,638,822 reads
Processed 1,000,000 reads in 27.38 seconds. Total reads: 5,000,000 in child 98211
Processed 1,000,000 reads in 26.33 seconds. Total reads: 6,000,000 in child 98211
Processed 1,000,000 reads in 26.91 seconds. Total reads: 7,000,000 in child 98211
Mapping done for process 98211. Processed 7,638,867 reads
Mapping done
Merging results
Correcting cell barcodes
Looking for a whitelist
Hello @curtisd0886,
this is a very interesting library you are running there.
I would propose to skip the barcode correction. A cell barcode of 50bp is gonna take ages to run because it creates all possible 1 distance other potential barcodes to "correct" the data.
Any particular reason the cell barcode is so long?
Thanks for the reply. So it is not actually 50 bp long there are three barcodes of 6bp each separate by constant regions. Is is possible to tell CITE-seq Count to look only at the three barcode regions?
Here is an example of our barcode region(B is barcode and N is UMI): BBBBBBCGACTCACTACAGGGBBBBBBTCGGTGACACGATCGBBBBBBNNNNNN
Sadly no, CSC does not have this capacity.
Do you know if the last barcode is enough to discriminate your cells? If it is, you can use the last 6 barcodes.
If not, then the "easy" way would be to filter out the constant regions and run the new R1 files without those.
Yea that could work. Do you know of a way to do that with out losing the barcode in between the two constant regions?
I would propose the use of cutadapt: https://cutadapt.readthedocs.io/en/stable/guide.html?highlight=adapters#multiple-adapters Seems it would support this by providing the constant regions as adapters.
awk and sed custom bash script would also work pretty easily.
This issue should be fixed with version 1.4.4
Which version are you running?
Maybe just give it a couple more hours.
On Mon, 8 Feb 2021, 03:22 YingzhengXu, [email protected] wrote:
Hi @Hoohm https://github.com/Hoohm,
Thank you again for developing this amazing tool! I had a similar running time issue(2 R1, 2 R2) where the program seemed to stuck at a point in merging result. But when I only turn 1 input(1 R1, 1 R2), the issue disappeared. Any insights what's going on?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/145#issuecomment-774824242, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVO2E7BISYCGWTSSP6CFTS55DIBANCNFSM4WTIQDTQ .
I was able to get cite-seq working by using cut adapt and awk to create one barcode. I am still running into an issue where the cell barcodes that are generated with cite-seq don’t match up with my RNA barcodes so I get an error in Seurat. I have tried using a whitelist based on barcodes from the RNA dataset with no luck. Is there anything you can recommend?