CITE-seq-Count icon indicating copy to clipboard operation
CITE-seq-Count copied to clipboard

Script won't complete

Open curtisd0886 opened this issue 3 years ago • 7 comments

Hello, I have been trying to use the CITE-seq-Count script, but cannot get it to complete. It keeps getting hung up at the "Looking for a Whitelist" process. Any ideas what could be causing this? I have attached my logs below.

/nethome/cndd3/miniconda3/bin/CITE-seq-Count -R1 Multi_hybrid_R1.fq.gz -R2 Multi_hybrid_R2.fq.gz -t invivo_cite.csv -cbf 1 -cbl 48 -umif 50 -umil 57 -cells 11000 -o /cite_seq Matplotlib created a temporary config/cache directory at /tmp/matplotlib-kbkieitb because the default path (/home/barcode/.config/matplotlib) is not a writable directory; it is highly re commended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. [WARNING] Read1 length is 150bp but you are using 56bp for Cell and UMI barcodes combined. This might lead to wrong cell attribution and skewed umi counts.

Counting number of reads Started mapping Processing 488,884,653 reads CITE-seq-Count is running with 64 cores.
Processed 1,000,000 reads in 22.55 seconds. Total reads: 1,000,000 in child 98148 Processed 1,000,000 reads in 22.48 seconds. Total reads: 2,000,000 in child 98148 Processed 1,000,000 reads in 22.61 seconds. Total reads: 3,000,000 in child 98148 Processed 1,000,000 reads in 1.0 minute, 20.64 seconds. Total reads: 1,000,000 in child 98149 Processed 1,000,000 reads in 22.5 seconds. Total reads: 4,000,000 in child 98148 Processed 1,000,000 reads in 22.29 seconds. Total reads: 2,000,000 in child 98149 Processed 1,000,000 reads in 21.93 seconds. Total reads: 5,000,000 in child 98148 Processed 1,000,000 reads in 22.36 seconds. Total reads: 3,000,000 in child 98149 Processed 1,000,000 reads in 22.63 seconds. Total reads: 6,000,000 in child 98148 Processed 1,000,000 reads in 2.0 minutes, 16.75 seconds. Total reads: 1,000,000 in child 98150 Processed 1,000,000 reads in 21.86 seconds. Total reads: 4,000,000 in child 98149 Processed 1,000,000 reads in 22.27 seconds. Total reads: 7,000,000 in child 98148 Processed 1,000,000 reads in 22.22 seconds. Total reads: 2,000,000 in child 98150 Processed 1,000,000 reads in 22.33 seconds. Total reads: 5,000,000 in child 98149 Mapping done for process 98148. Processed 7,638,822 reads Processed 1,000,000 reads in 23.53 seconds. Total reads: 3,000,000 in child 98150 Processed 1,000,000 reads in 24.98 seconds. Total reads: 6,000,000 in child 98149 Processed 1,000,000 reads in 3.0 minutes, 17.18 seconds. Total reads: 1,000,000 in child 98151 Processed 1,000,000 reads in 25.44 seconds. Total reads: 4,000,000 in child 98150 Processed 1,000,000 reads in 27.64 seconds. Total reads: 7,000,000 in child 98149 Processed 1,000,000 reads in 25.57 seconds. Total reads: 2,000,000 in child 98151 Processed 1,000,000 reads in 24.58 seconds. Total reads: 5,000,000 in child 98150 Mapping done for process 98149. Processed 7,638,822 reads Processed 1,000,000 reads in 26.42 seconds. Total reads: 3,000,000 in child 98151 Processed 1,000,000 reads in 24.31 seconds. Total reads: 6,000,000 in child 98150 Processed 1,000,000 reads in 4.0 minutes, 18.95 seconds. Total reads: 1,000,000 in child 98152 Processed 1,000,000 reads in 26.36 seconds. Total reads: 4,000,000 in child 98151 Processed 1,000,000 reads in 23.74 seconds. Total reads: 7,000,000 in child 98150 Processed 1,000,000 reads in 24.35 seconds. Total reads: 2,000,000 in child 98152 Mapping done for process 98150. Processed 7,638,822 reads Processed 1,000,000 reads in 23.96 seconds. Total reads: 5,000,000 in child 98151 Processed 1,000,000 reads in 25.97 seconds. Total reads: 3,000,000 in child 98152 Processed 1,000,000 reads in 29.64 seconds. Total reads: 5,000,000 in child 98205 Processed 1,000,000 reads in 31.02 seconds. Total reads: 3,000,000 in child 98206 Processed 1,000,000 reads in 1.0 hour, 8.0 minutes, 51.7 seconds. Total reads: 1,000,000 in child 98207 Processed 1,000,000 reads in 31.01 seconds. Total reads: 6,000,000 in child 98205 Processed 1,000,000 reads in 30.5 seconds. Total reads: 4,000,000 in child 98206 Processed 1,000,000 reads in 28.49 seconds. Total reads: 2,000,000 in child 98207 Processed 1,000,000 reads in 31.46 seconds. Total reads: 7,000,000 in child 98205 Processed 1,000,000 reads in 30.3 seconds. Total reads: 5,000,000 in child 98206 Mapping done for process 98205. Processed 7,638,822 reads Processed 1,000,000 reads in 27.55 seconds. Total reads: 3,000,000 in child 98207 Processed 1,000,000 reads in 1.0 hour, 10.0 minutes, 1.908 seconds. Total reads: 1,000,000 in child 98208 Processed 1,000,000 reads in 30.68 seconds. Total reads: 6,000,000 in child 98206 Processed 1,000,000 reads in 1.0 hour, 10.0 minutes, 15.98 seconds. Total reads: 1,000,000 in child 98209 Processed 1,000,000 reads in 29.15 seconds. Total reads: 4,000,000 in child 98207 Processed 1,000,000 reads in 29.11 seconds. Total reads: 2,000,000 in child 98208 Processed 1,000,000 reads in 31.77 seconds. Total reads: 7,000,000 in child 98206 Processed 1,000,000 reads in 29.44 seconds. Total reads: 2,000,000 in child 98209 Processed 1,000,000 reads in 28.81 seconds. Total reads: 5,000,000 in child 98207 Processed 1,000,000 reads in 27.64 seconds. Total reads: 3,000,000 in child 98208 Mapping done for process 98206. Processed 7,638,822 reads Processed 1,000,000 reads in 26.27 seconds. Total reads: 6,000,000 in child 98207 Processed 1,000,000 reads in 31.28 seconds. Total reads: 3,000,000 in child 98209 Processed 1,000,000 reads in 30.15 seconds. Total reads: 4,000,000 in child 98208 Processed 1,000,000 reads in 30.85 seconds. Total reads: 7,000,000 in child 98207 Processed 1,000,000 reads in 32.54 seconds. Total reads: 4,000,000 in child 98209 Processed 1,000,000 reads in 24.25 seconds. Total reads: 5,000,000 in child 98208 Mapping done for process 98207. Processed 7,638,822 reads Processed 1,000,000 reads in 1.0 hour, 12.0 minutes, 8.088 seconds. Total reads: 1,000,000 in child 98210 Processed 1,000,000 reads in 26.06 seconds. Total reads: 5,000,000 in child 98209 Processed 1,000,000 reads in 29.38 seconds. Total reads: 6,000,000 in child 98208 Processed 1,000,000 reads in 29.26 seconds. Total reads: 2,000,000 in child 98210 Processed 1,000,000 reads in 28.84 seconds. Total reads: 6,000,000 in child 98209 Processed 1,000,000 reads in 32.72 seconds. Total reads: 7,000,000 in child 98208 Processed 1,000,000 reads in 28.32 seconds. Total reads: 3,000,000 in child 98210 Mapping done for process 98208. Processed 7,638,822 reads Processed 1,000,000 reads in 29.1 seconds. Total reads: 7,000,000 in child 98209 Processed 1,000,000 reads in 1.0 hour, 13.0 minutes, 26.72 seconds. Total reads: 1,000,000 in child 98211 Mapping done for process 98209. Processed 7,638,822 reads Processed 1,000,000 reads in 27.73 seconds. Total reads: 4,000,000 in child 98210 Processed 1,000,000 reads in 29.42 seconds. Total reads: 2,000,000 in child 98211 Processed 1,000,000 reads in 26.99 seconds. Total reads: 5,000,000 in child 98210 Processed 1,000,000 reads in 27.07 seconds. Total reads: 3,000,000 in child 98211 Processed 1,000,000 reads in 25.64 seconds. Total reads: 6,000,000 in child 98210 Processed 1,000,000 reads in 27.17 seconds. Total reads: 4,000,000 in child 98211 Processed 1,000,000 reads in 26.57 seconds. Total reads: 7,000,000 in child 98210 Mapping done for process 98210. Processed 7,638,822 reads Processed 1,000,000 reads in 27.38 seconds. Total reads: 5,000,000 in child 98211 Processed 1,000,000 reads in 26.33 seconds. Total reads: 6,000,000 in child 98211 Processed 1,000,000 reads in 26.91 seconds. Total reads: 7,000,000 in child 98211 Mapping done for process 98211. Processed 7,638,867 reads Mapping done Merging results Correcting cell barcodes Looking for a whitelist

curtisd0886 avatar Jan 26 '21 11:01 curtisd0886

Hello @curtisd0886,

this is a very interesting library you are running there.

I would propose to skip the barcode correction. A cell barcode of 50bp is gonna take ages to run because it creates all possible 1 distance other potential barcodes to "correct" the data.

Any particular reason the cell barcode is so long?

Hoohm avatar Jan 27 '21 15:01 Hoohm

Thanks for the reply. So it is not actually 50 bp long there are three barcodes of 6bp each separate by constant regions. Is is possible to tell CITE-seq Count to look only at the three barcode regions?

Here is an example of our barcode region(B is barcode and N is UMI): BBBBBBCGACTCACTACAGGGBBBBBBTCGGTGACACGATCGBBBBBBNNNNNN

curtisd0886 avatar Jan 27 '21 16:01 curtisd0886

Sadly no, CSC does not have this capacity.

Do you know if the last barcode is enough to discriminate your cells? If it is, you can use the last 6 barcodes.

If not, then the "easy" way would be to filter out the constant regions and run the new R1 files without those.

Hoohm avatar Jan 27 '21 17:01 Hoohm

Yea that could work. Do you know of a way to do that with out losing the barcode in between the two constant regions?

curtisd0886 avatar Jan 28 '21 02:01 curtisd0886

I would propose the use of cutadapt: https://cutadapt.readthedocs.io/en/stable/guide.html?highlight=adapters#multiple-adapters Seems it would support this by providing the constant regions as adapters.

awk and sed custom bash script would also work pretty easily.

Hoohm avatar Jan 28 '21 07:01 Hoohm

This issue should be fixed with version 1.4.4

Which version are you running?

Maybe just give it a couple more hours.

On Mon, 8 Feb 2021, 03:22 YingzhengXu, [email protected] wrote:

Hi @Hoohm https://github.com/Hoohm,

Thank you again for developing this amazing tool! I had a similar running time issue(2 R1, 2 R2) where the program seemed to stuck at a point in merging result. But when I only turn 1 input(1 R1, 1 R2), the issue disappeared. Any insights what's going on?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hoohm/CITE-seq-Count/issues/145#issuecomment-774824242, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJVO2E7BISYCGWTSSP6CFTS55DIBANCNFSM4WTIQDTQ .

Hoohm avatar Feb 08 '21 06:02 Hoohm

I was able to get cite-seq working by using cut adapt and awk to create one barcode. I am still running into an issue where the cell barcodes that are generated with cite-seq don’t match up with my RNA barcodes so I get an error in Seurat. I have tried using a whitelist based on barcodes from the RNA dataset with no luck. Is there anything you can recommend?

curtisd0886 avatar Feb 15 '21 14:02 curtisd0886