C3POa
C3POa copied to clipboard
Calling Consensi Hang
Hi, I'm trying to replicate the B-Cell data from your 2018 paper (lovely work btw!)
I've downloaded the SRA data, so I'm working with SRR6924616_R2C2_full_length_cDNA_sequencing_of_single_human_B_cells_1.fastq.gz
.
I've git clone'd your repo and ran the setup without issue, along with Racon / BLAT conda installed (available in the path).
When running the command (using a subset of the data or the whole enchilada), it seems to hang on the "Calling consensi" portion for a long period of time, then the script finishes. The splint.fasta
is the one included in your repo. The output directory contains a splint_to_read_alignments.psl
file which is sizeable, but the R2C2_Consensus.fasta
& R2C2_Subreads.fastq
are empty.
Command:
python3 C3POa.py \
-r ../../Data/R2C2/SRR6924616_R2C2_full_length_cDNA_sequencing_of_single_human_B_cells_1.fastq.gz \
-o ../C3POa_All -s splint.fasta -l 1000 -d 500 -n 8 -g 1000
Log Contents:
C3POa version: v2.2.3
Total reads: 2873159
No splint reads: 796038 (27.71%)
Under len cutoff: 668197 (23.26%)
Total thrown away reads: 1464235 (50.96%)
Reads after preprocessing: 1408924
Output:
Aligning splints to reads with blat
Preprocessing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2205/2205 [19:48<00:00, 1.85it/s]
Catting psls: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2205/2205 [00:22<00:00, 99.51it/s]
Removing preprocessing files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2205/2205 [00:00<00:00, 2347.87it/s]
Calling consensi: 0%| | 0/2205 [1:44:10<?, ?it/s]
Catting consensus reads: 0it [00:00, ?it/s]
Catting subreads: 0it [00:00, ?it/s]
Removing files: 0it [00:00, ?it/s]
System shows 100% usage and those 8 threads actively working, even with no output on the Consensus Calling. Any advice would be very welcome / obvious errors I could be making! (and thanks for maintaining this repo).
As an aside, it would be really great if you were able to host the final bam for this paper on something like Zenodo (to compare pipeline outputs), but obviously understand that's outside the scope of this Github issue!
Thanks,
Andrew
Reading through some other issues, I adjusted the python file to add .get()
to line 248:
if current_num == target:
pool.apply_async(analyze_reads,
args=(args, tmp_reads, splint_dict, adapter_dict, adapter_set, iteration, racon),
callback=lambda _: pbar.update(1)
).get()
This gave the following traceback:
Reading existing psl file
Calling consensi: 0%| | 0/2205 [00:00<?, ?it/s]multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/andrewskelton/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/Users/andrewskelton/Tmp/Project_Data/IgA_Long/MinION/Software/C3POa/C3POa.py", line 123, in analyze_reads
scores = conk.conk(splint, seq, penalty)
AttributeError: module 'conk.conk' has no attribute 'conk'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C3POa.py", line 280, in <module>
main(args)
File "C3POa.py", line 247, in main
callback=lambda _: pbar.update(1)
File "/Users/andrewskelton/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
AttributeError: module 'conk.conk' has no attribute 'conk'
Calling consensi: 0%| | 0/2205 [00:07<?, ?it/s]
so it must be something to do with conk
, however starting python3 and running import conk as conk
runs fine.
I'm a bit stumped from here, so any points in the right direction would be much appreciated!
Hi Andrew,
I'm new in analysis of long read 10x single cell data generated from the ONT sequencer according to the C3POa work flow, but the pre-processing doesn't continue and finished at Calling consensi exactly as you mentioned in your issue. Despite the tools were installed with their dependencies, and I prepared the UMI_Splint.fasta used in the experiment, but unfortunately the process stopped as showed below:
command: (base) [ukhussein@ldragon3 C3POa-2.2.3]$ python3 C3POa.py -r ../../projects/nanopore_R2C2/10X_071_R2C2/test/dngqu0264_71_fastq_pass.tar.gz -s ./UMI_Splint.fasta/UMI_Splints.fasta -d 500 -l 100 -g 1000 -n 32 -o out2
abpoa
Output:
pr-processing
Log Contents: $ cat/out/c3poa.log C3POa version: v2.2.3 Total reads: 1687451 No splint reads: 1505291 (89.21%) Under len cutoff: 15 (0.00%) Total thrown away reads: 1505306 (89.21%) Reads after preprocessing: 182145
Could you please help me to figure out what is the problem?