megahit icon indicating copy to clipboard operation
megahit copied to clipboard

The N50 is very short from soil sample

Open lulunisrna opened this issue 5 years ago • 2 comments

Dear voutcn,

I have problem with my result for using MEGAHIT. My result for N50 is very short, around 450-550bp. My sample is from soil plantation. I have reed the same issue from this page and you give an advise for using min --min-count 1, but it doesn't work for me. I also already tried to running assembly with --kmin-1pass, but the result of N50 also too short, around 500bp. Now i'm trying for using --presets meta-large for this assembly. I hope i will get the good result. If my result is still bad, do you have an advice for me to fix this problem? Thank you.

lulunisrna avatar Jan 21 '20 15:01 lulunisrna

Hi, I've also been having a hard time finding suitable parameters for my soil datasets (#254). Did you manage to improve your N50 somehow?

franciscozorrilla avatar Feb 02 '20 19:02 franciscozorrilla

Soil samples are hard to assemble because of

  1. Very high bio-diversity (too many microorganisms) and a lot of them are sequenced at very low depth
  2. Some dominant microorganisms can be sequenced at extremely high depth which introduces a lot of sequencing error

No solution to the first problem other than sequencing a lot more data. For the second problem normalization may help. See https://github.com/voutcn/megahit/issues/239#issuecomment-534373589

voutcn avatar Feb 23 '20 23:02 voutcn