megahit
megahit copied to clipboard
The N50 is very short from soil sample
Dear voutcn,
I have problem with my result for using MEGAHIT. My result for N50 is very short, around 450-550bp. My sample is from soil plantation. I have reed the same issue from this page and you give an advise for using min --min-count 1
, but it doesn't work for me. I also already tried to running assembly with --kmin-1pass
, but the result of N50 also too short, around 500bp. Now i'm trying for using --presets meta-large
for this assembly. I hope i will get the good result. If my result is still bad, do you have an advice for me to fix this problem? Thank you.
Hi, I've also been having a hard time finding suitable parameters for my soil datasets (#254). Did you manage to improve your N50 somehow?
Soil samples are hard to assemble because of
- Very high bio-diversity (too many microorganisms) and a lot of them are sequenced at very low depth
- Some dominant microorganisms can be sequenced at extremely high depth which introduces a lot of sequencing error
No solution to the first problem other than sequencing a lot more data. For the second problem normalization may help. See https://github.com/voutcn/megahit/issues/239#issuecomment-534373589