shovill icon indicating copy to clipboard operation
shovill copied to clipboard

Spades warning messages?

Open maesaar opened this issue 3 years ago • 8 comments

Hello @andersgs I have run spades 3.14.1 included in shovill and I get constantly three types of warnings. Also mentioned in #19 . Are they benign or how should they be adressed?

Thanks

=== Error correction and assembling warnings:

0:00:35.313 92M / 2G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 2 0:00:35.313 92M / 2G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 2 0:00:28.189 124M / 4G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 12 0:00:28.190 124M / 4G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 12 0:00:22.490 115M / 4G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 6 0:00:22.490 115M / 4G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 6 0:00:18.378 138M / 4G WARN General (kmer_coverage_model.cpp : 218) Too many erroneous kmers, the estimates might be unreliable ======= Warnings saved to /home/bioinf/Desktop/CJ_21122020/shovill/CAMP3H_S101/spades/warnings.lo

maesaar avatar Dec 23 '20 16:12 maesaar

@maesaar we have not dug too deeply into that yet. But, it does suggest some issue with the underlying FASTQ data. Is this warning associated with a particular sample? Are you able to share it?

andersgs avatar Dec 23 '20 19:12 andersgs

@andersgs I can share the fastqs after holiday is that ok with you?

maesaar avatar Dec 23 '20 20:12 maesaar

@andersgs can i email the link directly? For now i have chosen the skesa assembler to use with shovill - do you think its good alternative?

Just for background the fastqs are 4x2 files concatenated as said in #144

maesaar avatar Dec 23 '20 20:12 maesaar

did you concatenate in the same order?

L1, L2, L3, L4 > R1 L1, L2, L3, L4 > R2

That may explain the issue you are observing.

And, emailing a like directly is fine.

andersgs avatar Dec 23 '20 20:12 andersgs

The cat commands are as follows:

cat CAMP01-08H_S34_L001_R1_001.fastq.gz CAMP01-08H_S34_L002_R1_001.fastq.gz CAMP01-08H_S34_L003_R1_001.fastq.gz CAMP01-08H_S34_L004_R1_001.fastq.gz > R1.fastq.gz

cat CAMP01-08H_S34_L001_R2_001.fastq.gz CAMP01-08H_S34_L002_R2_001.fastq.gz CAMP01-08H_S34_L003_R2_001.fastq.gz CAMP01-08H_S34_L004_R2_001.fastq.gz > R2.fastq.gz

shovill --R1 R1.fastq.gz --R2 R2.fastq.gz --outdir CAMP01-08H_S34 --keepfiles --minlen 200 --ram 58 --trim

And the warning: [spades] * 0:00:18.097 65M / 3G WARN General (kmer_coverage_model.cpp : 218) Too many erroneous kmers, the estimates might be unreliable

On Wed, 23. Dec 2020 at 22:39, Anders Goncalves da Silva < [email protected]> wrote:

did you concatenate in the same order?

L1, L2, L3, L4 > R1 L1, L2, L3, L4 > R2

That may explain the issue you are observing.

And, emailing a like directly is fine.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tseemann/shovill/issues/150#issuecomment-750464897, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZEVCFO2BJUO4ZWBKSSN4DSWJIPZANCNFSM4VHGPWSQ .

maesaar avatar Dec 23 '20 21:12 maesaar

@andersgs I was able to share the link to download the reads with the mentioned one warning message via e-mail.

if you need reads which includes different warning messages please let me know.

maesaar avatar Dec 23 '20 21:12 maesaar

@andersgs please look spades issue #630 where the logs are for additional information. Could you check why logs 2) and 3) in section a) give different warnings for spades? The first (log "2)") is concatenated FASTQs of R1 and R2 and then only trimmed in shovill followed separate spades run and the second (log "3)") is 4 pairs only trimmed with shovill separately and then the trimmed reads of R1s and R2s were concatenated and used in spades run.

maesaar avatar Dec 26 '20 04:12 maesaar

These warnings are usually caused by uneven coverage likely due to read subsampling / read correction. The results might be suboptimal / expect misassemblies.

asl avatar Aug 16 '21 12:08 asl