MBG icon indicating copy to clipboard operation
MBG copied to clipboard

Assembly changes from run to run

Open gbdias opened this issue 2 years ago • 2 comments

MBG bioconda 1.0.16

Hi,

  • I am testing MBG on an organellar genome and noticed the results can change if you run the program multiple times. I couldn't find a seed parameter in the options to make the assembly reproducible. Any tips?

  • Below is an example of two consecutive runs of MBG on the same input with identical parameters and the results differ in length by 4 bp.

(mbg) [guibo205@rackham2 mbg]$ MBG -i mapped_reads.fasta -o asm.gfa -k 2501 -w 2470 -a 10 -u 50 -t 8
MBG bioconda 1.0.16
Parameters: k=2501,w=2470,a=10,u=50,t=8,r=0,R=0,hpcvariantcov=0,errormasking=hpc,endkmers=no,blunt=no,keepgaps=no,guesswork=no,copycountfilter=no,onlylocal=no,filterwithinunitig=yes,cleaning=yes,cache=no
Collecting selected k-mers
Reading sequences from mapped_reads.fasta
5852 total selected k-mers in reads
1583 distinct selected k-mers in reads
Unitigifying
Filtering by unitig coverage
19 distinct selected k-mers in unitigs after filtering
Getting read paths
Reading sequences from mapped_reads.fasta
Building unitig sequences
Reading sequences from mapped_reads.fasta
Writing graph to asm.gfa
selecting k-mers and building graph topology took 0,617 s
unitigifying took 0,0 s
filtering unitigs took 0,0 s
getting read paths took 0,704 s
building unitig sequences took 0,809 s
forcing edge consistency took 0,0 s
writing the graph and calculating stats took 0,2 s
nodes: 1
edges: 1
assembly size 34948 bp, N50 34948
approximate number of k-mers ~ 32447

(mbg) [guibo205@rackham2 mbg]$ MBG -i mapped_reads.fasta -o asm2.gfa -k 2501 -w 2470 -a 10 -u 50 -t 8
MBG bioconda 1.0.16
Parameters: k=2501,w=2470,a=10,u=50,t=8,r=0,R=0,hpcvariantcov=0,errormasking=hpc,endkmers=no,blunt=no,keepgaps=no,guesswork=no,copycountfilter=no,onlylocal=no,filterwithinunitig=yes,cleaning=yes,cache=no
Collecting selected k-mers
Reading sequences from mapped_reads.fasta
5852 total selected k-mers in reads
1583 distinct selected k-mers in reads
Unitigifying
Filtering by unitig coverage
19 distinct selected k-mers in unitigs after filtering
Getting read paths
Reading sequences from mapped_reads.fasta
Building unitig sequences
Reading sequences from mapped_reads.fasta
Writing graph to asm2.gfa
selecting k-mers and building graph topology took 0,620 s
unitigifying took 0,0 s
filtering unitigs took 0,0 s
getting read paths took 0,699 s
building unitig sequences took 0,731 s
forcing edge consistency took 0,0 s
writing the graph and calculating stats took 0,2 s
nodes: 1
edges: 1
assembly size 34944 bp, N50 34944
approximate number of k-mers ~ 32443

gbdias avatar Dec 14 '23 17:12 gbdias

Hi, this is a bug. Could you share the input reads?

maickrau avatar Dec 18 '23 08:12 maickrau