strainy
strainy copied to clipboard
Transform stage occasionally crashes with >16 threads
Hello!
I'm trying to run stRainy on a collection of 14 related genomes to see if it will be useful to me in the future. i sequenced these 14 genomes independently and know what their genomes should look like in the end. Maybe this dataset is too large and i should be using strainyMAG to reduce the number of edges? the error below
== == Processing unitig edge_248_s2 == ==
[2024-03-03 22:04:09] [Thread 262513] INFO: ### Reading SNPs...
[2024-03-03 22:04:09] [Thread 262513] INFO: ### Reading Reads...
[2024-03-03 22:04:12] [Thread 262430] INFO: Split stage2: Break regions of low heterozygosity
[2024-03-03 22:04:22] [Thread 262513] INFO: ### Calculatind distances/Building adj matrix...
[2024-03-03 22:04:22] [Thread 262415] INFO: ### Calculatind distances/Building adj matrix...
[2024-03-03 22:04:28] [Thread 262513] INFO: ### Removing overweighed egdes...
[2024-03-03 22:04:28] [Thread 262513] INFO: ### Creating graph...
[2024-03-03 22:04:28] [Thread 262513] INFO: ### Searching clusters...
[2024-03-03 22:04:28] [Thread 262513] INFO: 5 clusters found
[2024-03-03 22:04:28] [Thread 262513] INFO: ### Cluster post-processing...
[2024-03-03 22:04:31] [Thread 262453] INFO: Split stage2: Break regions of low heterozygosity
[2024-03-03 22:04:36] [Thread 262453] WARNING: WARNING: error reading back the flye output, defaulting to empty sequence for consensus
/mnt/nfs/home/mikes92/miniconda3/envs/strainy/lib/python3.10/site-packages/Bio/SeqRecord.py:229: BiopythonDeprecationWarning: Using a string as the sequence is deprecated and will raise a TypeError in future.
It has been converted to a Seq object.
warnings.warn(
[2024-03-03 22:04:36] [Thread 262453] ERROR: Worker thread exception! [Errno 2] No such file or directory: '/scratch/mikes92/strainy_sar11/strainy_sar11/flye_outputs/flye_consensus_edge_248_s1_2010009_7143/ba
se_coverage.bed.gz'
Traceback (most recent call last):
File "/mnt/nfs/home/mikes92/stRainy/strainy/phase.py", line 30, in _thread_fun
cluster(i, shared_flye_consensus)
File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/cluster.py", line 137, in cluster
cl = postprocess(StRainyArgs().bam, cl, SNP_pos, data, edge, R, I, flye_consensus)
File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/cluster_postprocess.py", line 279, in postprocess
cl = join_clusters(cons, cl, R, edge, flye_consensus)
File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/cluster_postprocess.py", line 104, in join_clusters
M = build_adj_matrix_clusters(edge,cons, cl,consensus, True)
File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/cluster_postprocess.py", line 90, in build_adj_matrix_clusters
m[second_cl][first_cl] = matrix.distance_clusters(edge, first_cl, second_cl, cons, cl,flye_consensus, only_with_common_snip)
File "/mnt/nfs/home/mikes92/stRainy/strainy/clustering/build_adj_matrix.py", line 115, in distance_clusters
d = flye_consensus.cluster_distance_via_alignment(first_cl, second_cl, cl, edge, commonSNP)
File "/mnt/nfs/home/mikes92/stRainy/strainy/flye_consensus.py", line 426, in cluster_distance_via_alignment
second_cl_dict = self.flye_consensus(second_cl, edge, cl, debug)
File "/mnt/nfs/home/mikes92/stRainy/strainy/flye_consensus.py", line 233, in flye_consensus
bed_content = self._parse_bed_coverage(f"{StRainyArgs().output}/flye_outputs/flye_consensus_{edge}_{cluster}_{salt}/"
File "/mnt/nfs/home/mikes92/stRainy/strainy/flye_consensus.py", line 272, in _parse_bed_coverage
with gzip.open(filename, 'r') as f:
File "/mnt/nfs/home/mikes92/miniconda3/envs/strainy/lib/python3.10/gzip.py", line 58, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
File "/mnt/nfs/home/mikes92/miniconda3/envs/strainy/lib/python3.10/gzip.py", line 174, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/mikes92/strainy_sar11/strainy_sar11/flye_outputs/flye_consensus_edge_248_s1_2010009_7143/base_coverage.bed.gz'
Traceback (most recent call last):
File "/mnt/nfs/home/mikes92/stRainy/./strainy.py", line 89, in <module>
main()
File "/mnt/nfs/home/mikes92/stRainy/./strainy.py", line 82, in main
phase_main(args)
File "/mnt/nfs/home/mikes92/stRainy/strainy/phase.py", line 114, in phase_main
consensus_dict = phase(StRainyArgs().edges, args)
File "/mnt/nfs/home/mikes92/stRainy/strainy/phase.py", line 55, in phase
raise Exception("Error in worker thread, exiting")
Exception: Error in worker thread, exiting
When I look into the files it looks like other flye_consensus_edge folders have the correct base_coverage.bed.gz file, however the folder at issue contains a bubbles_1.fasta file instead. a polished_1.fasta also exists. However, both these files (bubbles_1.fasta, and polished_1.fasta) are empty
installed with conda, initial assembly was performed with the suggested flye parameters, and I was able to successfully run stRainy on the test dataset that came with.
running in Linux command :
./strainy.py -g /scratch/mikes92/strainy_sar11/assembly_graph.gfa -q /scratch/mikes92/strainy_sar11/uisw_sar11_reads.fastq.gz -o /scratch/mikes92/strainy_sar11/strainy_test -m nano -t 20
Thanks!
Mike