wtdbg2 icon indicating copy to clipboard operation
wtdbg2 copied to clipboard

how to adjusting parameters to improve the assembly result

Open linshengnan2020 opened this issue 4 years ago • 8 comments

hi, I have run a 1G , diploid genome with 70~80% repetitive sequences genome . The coverage of my PacBio data is approximately 40 x. As a result the final assembly N50 is 32k . I would be interested how to adjust the parameters to improve the assembly result. Could you please give me some advises? Thank you very much!

linshengnan2020 avatar Oct 21 '20 02:10 linshengnan2020

I'm dealing with a much larger genome (26Gbp) but with similar levels of repeats. It may seem counter intuitive but increasing the required overlap from the default of 2Kbp to 5kbp (using the -l flag) has helped my assemblies. I noticed when I mapped the raw reads back onto the raw.fa that there were a number of locations where the assembly was collapsing around repeats but by increasing the minimum overlap I was able to get rid of a lot of these and improve the overall length of the assembly at the cost of increasing the number of contigs. However, I'm going to scaffold at a later stage once I'm done with error correction and polishing so I should be able to improve things again then. Better to have more contigs without repeat regions being collapsed.

shanesturrock avatar Nov 26 '20 19:11 shanesturrock

The solution may includes -l, '-R -s' and '--aln-dovtail -1'.

ruanjue avatar Nov 27 '20 00:11 ruanjue

I've been using -p 21 -S 2 --aln-noskip --rescue-low-cov-edges --tidy-reads 5000 -l 5000 but I'm still tweaking and testing. The good thing is the turnaround time is really short due to how fast the program is so I can try different settings and investigate the effects.

shanesturrock avatar Nov 27 '20 01:11 shanesturrock

Is there a specific parameter that needs to be adjusted and/or input to wtdbg2 to specify the coverage depth? Or is that irrelevant for the programme to run correctly?

cement-head avatar Dec 08 '20 16:12 cement-head

Have a look at wtdbg2 --help, there are two relative options, --limit-input and -X.

ruanjue avatar Dec 09 '20 01:12 ruanjue

Hi Prof. Ruan,

I also have this question with similar levels of repeats. As your advice, I added up "-l -R -s --aln-dovetail -1" in this first run.

I am trying to assembly it again. Hope it works.

Thank you!

lifan18 avatar Mar 15 '21 08:03 lifan18

Hi Prof. Ruan,

I also have this question with similar levels of repeats. As your advice, I added up "-l -R -s --aln-dovetail -1" in this first run.

I am trying to assembly it again. Hope it works.

Thank you!

Hi Prof. Ruan,

I tried to use the 4 parameters together, but I got a more bad result than I did not add up -l -R -s --aln-dovetail -1. Is any problem to add up the 4 parameters at the same time?

-t 96 -fo Species -l -R -s --tidy-reads 5000 --edge-min 3 --rescue-low-cov-edges --aln-dovetail -1

Hope ur reply.

Thank you very much!

Li Fan

lifan18 avatar Mar 22 '21 10:03 lifan18

-R works at the step of generating alignments, --aln-dovetail works at the step of filtering alignments, and -s wokrs at both steps. So, you can use a loose -s together with -R at the first run, then --load-alignemnts and tune a better results with different parameters.

ruanjue avatar Mar 23 '21 06:03 ruanjue