spades icon indicating copy to clipboard operation
spades copied to clipboard

metaviral fails without error notification

Open geboro opened this issue 4 years ago • 11 comments

I'm running metaviral SPAdes with low-complexity (2-5 viruses) samples, and while it successfully works with most of them, three of them are consistently failing without warnings. The logfile reports no errors, and I even have scaffolds.fasta and contigs.fasta files produced, but they are empty (size zero). I had previously recovered complete viral genomes (19 kbp) with SPAdes 3.13 from these very same samples, but with v3.15 even using normal spades I get hundreds of scaffolds <3000 kbp.

From the output, I only find assembly_graph_after_simplification.gfa but none of the final assembly graph files. In the K99 directory I find files with the edges_before_XXX.fasta files, but all the components and final_contigs files are empty, so I guess something is failing in this step.

Cheers!

geboro avatar Jan 17 '21 13:01 geboro

Tagging metaviralSPAdes' author for the troubleshooting @Dmitry-Antipov

asl avatar Jan 17 '21 13:01 asl

Hi. Could you please send us the spades.log file? (either to [email protected] or as attach here)

Dmitry-Antipov avatar Jan 18 '21 10:01 Dmitry-Antipov

Thanks. Here it is. spades.log Cheers!

geboro avatar Jan 20 '21 10:01 geboro

I have run into the same issue as @geboro. I tested several spades modules on 25 isolate miseq viral libraries. All modules produce a contigs.fasta file except when using --metaviral flag, in which case the file is empty for several samples. Spades runs to completion without errors. There is a warning about the insert length, but this appears in the log files for successful runs as well spades.log

snayfach avatar Jul 21 '22 15:07 snayfach

Hi Actually it is normal that metaviralSPAdes do not detect any viral-like contigs for some samples where there are no circular (and specific linear) paths with some conditions on coverage and length, but this should not happen for isolate viral libraries. Is it possible that there are quasispecies or groups of relative species in these libraries? This can be seen if you look (or send us) on the graph before all metaviral procedures - .../gdFB431/K127/assembly_graph_after_simplification.gfa

In this specific case you have very high average coverage - that may also prevent viralSPAdes from finding complete viruses in the data.

Dmitry-Antipov avatar Jul 22 '22 09:07 Dmitry-Antipov

I don't think the high coverage is a problem. Many other libraries had similar coverage (500-1000x) and finished without issues.

I've attached the assembly graph file. I'd be very interested in determining if this (or other) libraries contained closely related, but distinct viral strains/species.

Update: I took a look at the assembly_graph_after_simplification.gfa files. In the two libraries that failed to yield a finished assembly, there was a low ratio segments (S) to links (L) (mean=4.25) relative to the rest of the libraries (mean=38x). I think this answers the question and indicates that there was strain variation in these two libraries, but I'd welcome any insights you may have.

snayfach avatar Jul 22 '22 15:07 snayfach

Yes, this looks like a case with multiple strains - we can see three bulges of similar length and a complex region graph

With lower coverage metaviralSPAdes could output one (with higher coverage) of these strains, but the coverage is too high - metaviralSPAdes has cutoff 600x for edge removal procedure.

Speaking on segments to link ratio - it may rather correspond mostly to the low coverage trash contigs (that were removed from the picture above) - there are lots of isolated trash contigs with low coverage, and with higher dataset coverage there will be more of those.

Dmitry-Antipov avatar Jul 23 '22 21:07 Dmitry-Antipov

Thanks, this is resolved as far as I'm concerned. I might suggest adding a warning or something to the log file that indicates why no final assembly is output. That might help future users.

Also, if you have any pointers for extracting this information from the assembly graph (number and size of bulges), that would be great. With the goal of flagging assemblies that might contain multiple strains.

snayfach avatar Jul 23 '22 21:07 snayfach

Note that SPAdes 3.15.4 includes a dedicated diagnostics for empty output here. So it won't come as a surprise :)

asl avatar Aug 01 '22 07:08 asl

For the last assembly (shown above) I downsampled the library to 650x coverage and the program output a circular genome. However, I'm dealing with one last tricky phage library for which metaviralspades won't complete even after downsamping. The coverage is ~500x and the expected genome size is ~75 Kbp. Looking at the assembly graph there are 6 small bulges <2 Kbp that do not have abnormally high coverage.

I've attached the log and assembly graph: spades.log assembly_graph_after_simplification.gfa.zip

snayfach avatar Aug 03 '22 04:08 snayfach

Hello, I am new to this, does this mean we need to do downsample our data? If so, does this affect the final assembly or even the number of viral species we could find. Sorry if this is a stupid question. I hope someone could also give me a reference for @snayfach's statement "there was a low ratio segments (S) to links (L) (mean=4.25) relative to the rest of the libraries (mean=38x)" I really don't understand this.

mchlou avatar Mar 04 '23 15:03 mchlou