C3POa icon indicating copy to clipboard operation
C3POa copied to clipboard

Calling Consensi Hang

Open AndrewSkelton opened this issue 3 years ago • 2 comments

Hi, I'm trying to replicate the B-Cell data from your 2018 paper (lovely work btw!)

I've downloaded the SRA data, so I'm working with SRR6924616_R2C2_full_length_cDNA_sequencing_of_single_human_B_cells_1.fastq.gz.

I've git clone'd your repo and ran the setup without issue, along with Racon / BLAT conda installed (available in the path).

When running the command (using a subset of the data or the whole enchilada), it seems to hang on the "Calling consensi" portion for a long period of time, then the script finishes. The splint.fasta is the one included in your repo. The output directory contains a splint_to_read_alignments.psl file which is sizeable, but the R2C2_Consensus.fasta & R2C2_Subreads.fastq are empty.

Command:

python3 C3POa.py \
                -r ../../Data/R2C2/SRR6924616_R2C2_full_length_cDNA_sequencing_of_single_human_B_cells_1.fastq.gz \
                -o ../C3POa_All -s splint.fasta -l 1000 -d 500 -n 8 -g 1000

Log Contents:

C3POa version: v2.2.3
Total reads: 2873159
No splint reads: 796038 (27.71%)
Under len cutoff: 668197 (23.26%)
Total thrown away reads: 1464235 (50.96%)
Reads after preprocessing: 1408924

Output:

Aligning splints to reads with blat
Preprocessing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2205/2205 [19:48<00:00,  1.85it/s]
Catting psls: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2205/2205 [00:22<00:00, 99.51it/s]
Removing preprocessing files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2205/2205 [00:00<00:00, 2347.87it/s]
Calling consensi:   0%|                                                                                                                                                                                  | 0/2205 [1:44:10<?, ?it/s]
Catting consensus reads: 0it [00:00, ?it/s]
Catting subreads: 0it [00:00, ?it/s]
Removing files: 0it [00:00, ?it/s]

System shows 100% usage and those 8 threads actively working, even with no output on the Consensus Calling. Any advice would be very welcome / obvious errors I could be making! (and thanks for maintaining this repo).

As an aside, it would be really great if you were able to host the final bam for this paper on something like Zenodo (to compare pipeline outputs), but obviously understand that's outside the scope of this Github issue!

Thanks,

Andrew

AndrewSkelton avatar Jul 12 '21 19:07 AndrewSkelton

Reading through some other issues, I adjusted the python file to add .get() to line 248:

        if current_num == target:
            pool.apply_async(analyze_reads,
                args=(args, tmp_reads, splint_dict, adapter_dict, adapter_set, iteration, racon),
                callback=lambda _: pbar.update(1)
            ).get()

This gave the following traceback:

Reading existing psl file
Calling consensi:   0%|                                                                                                                                                                                    | 0/2205 [00:00<?, ?it/s]multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/andrewskelton/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/Users/andrewskelton/Tmp/Project_Data/IgA_Long/MinION/Software/C3POa/C3POa.py", line 123, in analyze_reads
    scores = conk.conk(splint, seq, penalty)
AttributeError: module 'conk.conk' has no attribute 'conk'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C3POa.py", line 280, in <module>
    main(args)
  File "C3POa.py", line 247, in main
    callback=lambda _: pbar.update(1)
  File "/Users/andrewskelton/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
AttributeError: module 'conk.conk' has no attribute 'conk'
Calling consensi:   0%|                                                                                                                                                                                    | 0/2205 [00:07<?, ?it/s]

so it must be something to do with conk, however starting python3 and running import conk as conk runs fine.

I'm a bit stumped from here, so any points in the right direction would be much appreciated!

AndrewSkelton avatar Jul 13 '21 09:07 AndrewSkelton

Hi Andrew,

I'm new in analysis of long read 10x single cell data generated from the ONT sequencer according to the C3POa work flow, but the pre-processing doesn't continue and finished at Calling consensi exactly as you mentioned in your issue. Despite the tools were installed with their dependencies, and I prepared the UMI_Splint.fasta used in the experiment, but unfortunately the process stopped as showed below:

command: (base) [ukhussein@ldragon3 C3POa-2.2.3]$ python3 C3POa.py -r ../../projects/nanopore_R2C2/10X_071_R2C2/test/dngqu0264_71_fastq_pass.tar.gz -s ./UMI_Splint.fasta/UMI_Splints.fasta -d 500 -l 100 -g 1000 -n 32 -o out2

abpoa abpoa

Output: pr-processing pr-processing

Log Contents: $ cat/out/c3poa.log C3POa version: v2.2.3 Total reads: 1687451 No splint reads: 1505291 (89.21%) Under len cutoff: 15 (0.00%) Total thrown away reads: 1505306 (89.21%) Reads after preprocessing: 182145

Could you please help me to figure out what is the problem?

Usamahussein551980 avatar Nov 03 '23 09:11 Usamahussein551980