strainy icon indicating copy to clipboard operation
strainy copied to clipboard

Transform stage occasionally crashes with >16 threads

Open Mikes92 opened this issue 4 months ago • 22 comments

Hello!

I'm trying to run stRainy on a collection of 14 related genomes to see if it will be useful to me in the future. i sequenced these 14 genomes independently and know what their genomes should look like in the end. Maybe this dataset is too large and i should be using strainyMAG to reduce the number of edges? the error below

== == Processing unitig edge_248_s2 == ==                                                      
[2024-03-03 22:04:09] [Thread 262513] INFO:  ### Reading SNPs...                                                                                                                                                 
[2024-03-03 22:04:09] [Thread 262513] INFO:  ### Reading Reads...                                                                                                                                                
[2024-03-03 22:04:12] [Thread 262430] INFO:  Split stage2: Break regions of low heterozygosity                                                                                                                   
[2024-03-03 22:04:22] [Thread 262513] INFO:  ### Calculatind distances/Building adj matrix...                                                                                                                    
[2024-03-03 22:04:22] [Thread 262415] INFO:  ### Calculatind distances/Building adj matrix...                                                                                                                    
[2024-03-03 22:04:28] [Thread 262513] INFO:  ### Removing overweighed egdes...                                                                                                                                   
[2024-03-03 22:04:28] [Thread 262513] INFO:  ### Creating graph...                                                                                                                                               
[2024-03-03 22:04:28] [Thread 262513] INFO:  ### Searching clusters...                                                                                                                                           
[2024-03-03 22:04:28] [Thread 262513] INFO:  5 clusters found                                                                                                                                                    
[2024-03-03 22:04:28] [Thread 262513] INFO:  ### Cluster post-processing...                                                                                                                                      
[2024-03-03 22:04:31] [Thread 262453] INFO:  Split stage2: Break regions of low heterozygosity                                                                                                                   
[2024-03-03 22:04:36] [Thread 262453] WARNING:  WARNING: error reading back the flye output, defaulting to empty sequence for consensus                                                                          
/mnt/nfs/home/mikes92/miniconda3/envs/strainy/lib/python3.10/site-packages/Bio/SeqRecord.py:229: BiopythonDeprecationWarning: Using a string as the sequence is deprecated and will raise a TypeError in future. 
It has been converted to a Seq object.              
  warnings.warn(                                    
[2024-03-03 22:04:36] [Thread 262453] ERROR:  Worker thread exception! [Errno 2] No such file or directory: '/scratch/mikes92/strainy_sar11/strainy_sar11/flye_outputs/flye_consensus_edge_248_s1_2010009_7143/ba
se_coverage.bed.gz'                                 
Traceback (most recent call last):                                                                      
  File "/mnt/nfs/home/mikes92/stRainy/strainy/phase.py", line 30, in _thread_fun                                                                                                                                 
    cluster(i, shared_flye_consensus)                                                                   
  File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/cluster.py", line 137, in cluster                                                                                                                       
    cl = postprocess(StRainyArgs().bam, cl, SNP_pos, data, edge, R, I, flye_consensus)                                                                                                                           
  File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/cluster_postprocess.py", line 279, in postprocess                                                                                                       
    cl = join_clusters(cons, cl, R, edge, flye_consensus)                                               
  File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/cluster_postprocess.py", line 104, in join_clusters                                                                                                     
    M = build_adj_matrix_clusters(edge,cons, cl,consensus, True)                                                                                                                                                 
  File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/cluster_postprocess.py", line 90, in build_adj_matrix_clusters                                                                                          
    m[second_cl][first_cl] = matrix.distance_clusters(edge, first_cl, second_cl, cons, cl,flye_consensus, only_with_common_snip)                                                                                 
  File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/build_adj_matrix.py", line 115, in distance_clusters                                                                                                    
    d = flye_consensus.cluster_distance_via_alignment(first_cl, second_cl, cl, edge, commonSNP)                                                                                                                  
  File "/mnt/nfs/home/mikes92/stRainy/strainy/flye_consensus.py", line 426, in cluster_distance_via_alignment                                                                                                    
    second_cl_dict = self.flye_consensus(second_cl, edge, cl, debug)                                                                                                                                             
  File "/mnt/nfs/home/mikes92/stRainy/strainy/flye_consensus.py", line 233, in flye_consensus                                                                                                                    
    bed_content = self._parse_bed_coverage(f"{StRainyArgs().output}/flye_outputs/flye_consensus_{edge}_{cluster}_{salt}/"                                                                                        
  File "/mnt/nfs/home/mikes92/stRainy/strainy/flye_consensus.py", line 272, in _parse_bed_coverage                                                                                                               
    with gzip.open(filename, 'r') as f:                                                                 
  File "/mnt/nfs/home/mikes92/miniconda3/envs/strainy/lib/python3.10/gzip.py", line 58, in open                                                                                                                  
    binary_file = GzipFile(filename, gz_mode, compresslevel)                                                                                                                                                     
  File "/mnt/nfs/home/mikes92/miniconda3/envs/strainy/lib/python3.10/gzip.py", line 174, in __init__                                                                                                             
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')                                                                                                                                             
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/mikes92/strainy_sar11/strainy_sar11/flye_outputs/flye_consensus_edge_248_s1_2010009_7143/base_coverage.bed.gz'   

Traceback (most recent call last):                                                                      
  File "/mnt/nfs/home/mikes92/stRainy/./strainy.py", line 89, in <module>                                                                                                                                        
    main()                                          
  File "/mnt/nfs/home/mikes92/stRainy/./strainy.py", line 82, in main                                                                                                                                            
    phase_main(args)                                
  File "/mnt/nfs/home/mikes92/stRainy/strainy/phase.py", line 114, in phase_main                                                                                                                                 
    consensus_dict = phase(StRainyArgs().edges, args)                                                   
  File "/mnt/nfs/home/mikes92/stRainy/strainy/phase.py", line 55, in phase                                                                                                                                       
    raise Exception("Error in worker thread, exiting")                                                  
Exception: Error in worker thread, exiting

When I look into the files it looks like other flye_consensus_edge folders have the correct base_coverage.bed.gz file, however the folder at issue contains a bubbles_1.fasta file instead. a polished_1.fasta also exists. However, both these files (bubbles_1.fasta, and polished_1.fasta) are empty

installed with conda, initial assembly was performed with the suggested flye parameters, and I was able to successfully run stRainy on the test dataset that came with.

running in Linux command :

./strainy.py -g /scratch/mikes92/strainy_sar11/assembly_graph.gfa -q /scratch/mikes92/strainy_sar11/uisw_sar11_reads.fastq.gz -o /scratch/mikes92/strainy_sar11/strainy_test -m nano -t 20

Thanks!

Mike

Mikes92 avatar Mar 04 '24 17:03 Mikes92