C3POa icon indicating copy to clipboard operation
C3POa copied to clipboard

Very few subreads and consensus reads despite many reads after preprocessing

Open kvg opened this issue 4 years ago • 1 comments

Hello, I'm testing out C3POa v2.2.3 on a small test dataset (176,000 reads from a much larger PromethION run). I'm hoping to use C3POa's demultiplexing feature and I've prepared a splints file with four sequences. Initial processing looks good at first:

$ # python3 C3POa.py -r /data/chunk.fastq -s /data/splints.fasta -l 100 -d 500 -g 1000 -o out
Aligning splints to reads with blat
Preprocessing:  99%|█████████████████████████████████████████████████████████████████████████▌| 176/177 [02:09<00:00,  1.36it/s]
Catting psls: 100%|██████████████████████████████████████████████████████████████████████████| 176/176 [00:01<00:00, 129.95it/s]
Removing preprocessing files: 100%|█████████████████████████████████████████████████████████| 176/176 [00:00<00:00, 2590.99it/s]
Calling consensi:   0%|                                                                                 | 0/177 [02:25<?, ?it/s]
Catting consensus reads: 100%|█████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11949.58it/s]
Catting subreads: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 7898.00it/s]
Removing files: 100%|█████████████████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 4450.05it/s]
Catting consensus reads: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9177.91it/s]
Catting subreads: 100%|███████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 8104.33it/s]
Removing files: 100%|█████████████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 4262.17it/s]
Catting consensus reads: 100%|███████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 13879.81it/s]
Catting subreads: 100%|██████████████████████████████████████████████████████████████████████| 87/87 [00:00<00:00, 11671.71it/s]
Removing files: 100%|█████████████████████████████████████████████████████████████████████████| 87/87 [00:00<00:00, 4974.09it/s]
Catting consensus reads: 100%|████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 7833.72it/s]
Catting subreads: 100%|██████████████████████████████████████████████████████████████████████| 82/82 [00:00<00:00, 10553.00it/s]
Removing files: 100%|█████████████████████████████████████████████████████████████████████████| 82/82 [00:00<00:00, 4555.04it/s]
(lr-c3poa) root@f8924132b3ed:/#

$ cat out/c3poa.log
C3POa version: v2.2.3
Total reads: 176000
No splint reads: 37306 (21.20%)
Under len cutoff: 0 (0.00%)
Total thrown away reads: 37306 (21.20%)
Reads after preprocessing: 138694

However, in checking the output subread and consensus files, I see very few entries:

# grep -c '^[>@]' out/10x_Splint_*/*
out/10x_Splint_1/R2C2_Consensus.fasta:4
out/10x_Splint_1/R2C2_Subreads.fastq:96
out/10x_Splint_2/R2C2_Consensus.fasta:1
out/10x_Splint_2/R2C2_Subreads.fastq:60
out/10x_Splint_3/R2C2_Consensus.fasta:19
out/10x_Splint_3/R2C2_Subreads.fastq:282
out/10x_Splint_4/R2C2_Consensus.fasta:13
out/10x_Splint_4/R2C2_Subreads.fastq:325

These seem like awfully low numbers to me, but it's not clear to me where they're getting lost. Shouldn't the total number of subreads add up to reads after preprocessing? And assuming 5-10 passes per subreads, shouldn't the number of consensus reads be somewhere between 14k - 30k reads? Is there a way to know what's happening to the rest of the reads? Or is my understanding simply incorrect?

Thanks, -Kiran

kvg avatar Jun 02 '21 06:06 kvg

Hello Kiran,

I'm trying to analyze the long read data generated from the ONT sequencer according to the C3POa work flow, but the pre-processing doesn't continue and finished at Calling consensi. Despite the tools were installed with their dependencies, and I prepared the UMI_Splint.fasta used in the experiment, but unfortunately the process stopped as showed below:

command: (base) [ukhussein@ldragon3 C3POa-2.2.3]$ python3 C3POa.py -r ../../projects/nanopore_R2C2/10X_071_R2C2/test/dngqu0264_71_fastq_pass.tar.gz -s ./UMI_Splint.fasta/UMI_Splints.fasta -d 500 -l 100 -g 1000 -n 32 -o out2

abpoa

abpoa

Output: pr-processing pr-processing

Log Contents: $ cat/out/c3poa.log C3POa version: v2.2.3 Total reads: 1687451 No splint reads: 1505291 (89.21%) Under len cutoff: 15 (0.00%) Total thrown away reads: 1505306 (89.21%) Reads after preprocessing: 182145

Could you please help me to figure out what is the problem?

Usamahussein551980 avatar Nov 03 '23 09:11 Usamahussein551980