Flye icon indicating copy to clipboard operation
Flye copied to clipboard

--polish-target has reduced N50 and largest contig

Open lilypeck opened this issue 5 months ago • 1 comments

Hello

Firstly thanks for making this great tool.

I have usedFlye 2.9.3-b1797 to assemble ONT reads for a plant genome (assembly size 830 Mb). I installed Flye using bioconda. Before running Flye I removed reads shorter than 5kb.

flye --nano-hq /u/project/vlsork/ldpeck/longreads/fastq/${INFILE%_*}_ALLpass.fl5kb.fastq.gz \
        --genome-size 830m -o flye-hq-${INFILE%_*} -t 7 --scaffold

Assembly                    assembly  
# contigs (>= 0 bp)         7408      
# contigs (>= 1000 bp)      7407      
# contigs (>= 5000 bp)      7381      
# contigs (>= 10000 bp)     7327      
# contigs (>= 25000 bp)     6986      
# contigs (>= 50000 bp)     6169      
Total length (>= 0 bp)      2947843323
Total length (>= 1000 bp)   2947842862
Total length (>= 5000 bp)   2947760873
Total length (>= 10000 bp)  2947336694
Total length (>= 25000 bp)  2941151963
Total length (>= 50000 bp)  2910673812
# contigs                   7395      
Largest contig              7272865   
Total length                2947819947
GC (%)                      35.47     
N50                         821055    
N90                         189133    
auN                         1272818.7 
L50                         922       
L90                         3789  
# N's per 100 kbp           0.47  

Then I ran --polish-target with two iterations

flye --polish-target flye-hq-${INFILE%_*}/assembly.fasta \
	--nano-hq /u/project/vlsork/ldpeck/longreads/fastq/${INFILE%_*}_ALLpass.fl5kb.fastq.gz \
	--iterations 2 --threads 7

Assembly                    polished_2
# contigs (>= 0 bp)         7173      
# contigs (>= 1000 bp)      7121      
# contigs (>= 5000 bp)      6960      
# contigs (>= 10000 bp)     6656      
# contigs (>= 25000 bp)     5754      
# contigs (>= 50000 bp)     4806      
Total length (>= 0 bp)      1557125882
Total length (>= 1000 bp)   1557095537
Total length (>= 5000 bp)   1556616863
Total length (>= 10000 bp)  1554379940
Total length (>= 25000 bp)  1538848805
Total length (>= 50000 bp)  1504360171
# contigs                   7034      
Largest contig              5266179   
Total length                1556930564
GC (%)                      35.44     
N50                         487344    
N90                         108678    
auN                         780367.3  
L50                         811       
L90                         3461      
# N's per 100 kbp           0.00 

You can see that the polishing improved the number of N's and reduced total number of contigs, but the N50 and largest contig have both decreased? I have attached both flye log files from the original assembly step (flye.log) and from the polishing step (flye_polish.log)

Do you know why this might be?

Thanks

Lily

flye.log.gz flye_polish.log

lilypeck avatar Aug 27 '24 00:08 lilypeck