spades icon indicating copy to clipboard operation
spades copied to clipboard

Warnings using shovill with spades

Open maesaar opened this issue 3 years ago • 9 comments

Hello @asl I have run spades 3.14.1 included in shovill and I get constantly three types of warnings. Are those coming because of previous tools used and are they benign or how should they be adressed?

Thanks

=== Error correction and assembling warnings:

  • 0:00:35.313 92M / 2G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 2
  • 0:00:35.313 92M / 2G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 2
  • 0:00:28.189 124M / 4G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 12
  • 0:00:28.190 124M / 4G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 12
  • 0:00:22.490 115M / 4G WARN General (kmer_coverage_model.cpp : 327) Valley value was estimated improperly, reset to 6
  • 0:00:22.490 115M / 4G WARN General (kmer_coverage_model.cpp : 366) Failed to determine erroneous kmer threshold. Threshold set to: 6
  • 0:00:18.378 138M / 4G WARN General (kmer_coverage_model.cpp : 218) Too many erroneous kmers, the estimates might be unreliable ======= Warnings saved to /home/bioinf/Desktop/CJ_21122020/shovill/CAMP3H_S101/spades/warnings.log

maesaar avatar Dec 23 '20 15:12 maesaar

Please rerun just SPAdes and check whether the warnings are still there. But these ones are not benign and might indicate suboptimal assembly.

asl avatar Dec 23 '20 15:12 asl

@asl suboptimal means that I have to do additional checks (quast against known reference) to confirm they are appropriate to use?

maesaar avatar Dec 23 '20 15:12 maesaar

Last question: if suboptimal assembly fits the purpose is it okay to use?

maesaar avatar Dec 23 '20 16:12 maesaar

@maesaar i would be a bit suspect of the assembly as suggested by @asl --- unsure it would be fit for purpose. as i mentioned in https://github.com/tseemann/shovill/issues/150 our experience suggests there is something going on with the underlying FASTQ data.

andersgs avatar Dec 23 '20 19:12 andersgs

@asl @andersgs What does the warning

Valley value was estimated improperly, reset to 2

separately means?

maesaar avatar Dec 23 '20 22:12 maesaar

There might be multiple sources of problems:

  • Issues with the input data (e.g. uneven coverage or coverage gap)
  • Aggressive processing of Shovil (that could lead to coverage gaps, uneven coverage, etc.)

This is why I said it would make sense to run SPAdes alone, w/o additional pre-processing tools, wrappers, etc. The warning means that some SPAdes internal sanity checks failed and therefore the result might be suboptimal, e.g. fragmented assembly, misassemblies, etc. Whether it's "ok" or not – it's up to the user to decide.

asl avatar Dec 23 '20 23:12 asl

@asl I ran two spades runs. First I used raw reads with spades and then trimmed reads in spades.

I still got the Valley value warnings - so there must be some problems with sequencing?

On Thu, 24. Dec 2020 at 01:06, Anton Korobeynikov [email protected] wrote:

There might be multiple sources of problems:

  • Issues with the input data (e.g. uneven coverage or coverage gap)
  • Aggressive processing of Shovil (that could lead to coverage gaps, uneven coverage, etc.)

This is why I said it would make sense to run SPAdes alone, w/o additional pre-processing tools, wrappers, etc. The warning means that some SPAdes internal sanity checks failed and therefore the result might be suboptimal, e.g. fragmented assembly, misassemblies, etc. Whether it's "ok" or not – it's up to the user to decide.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ablab/spades/issues/630#issuecomment-750533505, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZEVCD4NOYBCWV5LRA74ELSWJZV3ANCNFSM4VHDAH3Q .

maesaar avatar Dec 24 '20 11:12 maesaar

will you please attach your spades.log files?

asl avatar Dec 24 '20 12:12 asl

Hello again,

Originally I had four FASTQs for R1 and four for R2 (4x2): a) Only spades:

  1. I used cat to concatenate R1s and R2s and did not do trimming: spades.log
  2. I used cat to concatenate R1s and R2s and did the only trimming as in shovill: spades.log
  3. I did not concatenate R1s and R2s, but did only rimming as in shovill separately with four pairs and then used cat to concatenate trimmed R1s and R2s: spades.log
  4. I used yaml file using --dataset with 4x2 R1s and R2s without trimming: spades.log
  5. I used yaml file using --dataset with 4x2 R1s and R2s only trimmed as in shovill: spades.log

b) shovill;

  1. shovill pipeline with cat concatenated FASTQs or R1s and R2s: shovill.log

maesaar avatar Dec 26 '20 03:12 maesaar