svmu
svmu copied to clipboard
Loss of chr information in cm.lastz.txt and sv.lastz.txt
Hello mahulchak, After I run svmu with a genome with two reference genomes, the out put of cm.lastz.txt and sv.lastz.txt shows no complete chromosomes for one reference genome .
The code for lastz and svmu are the same, but different in nucmer, as shown in the following.
NUCMER for refer1
nucmer --maxmatch --noextend --prefix=Sample_ref1 ../reference_genome1.fasta ../sample.genomic.fa
NUCMER for refer2
nucmer --prefix=Sample_ref2 ../reference_genome2.fasta ../sample.genomic.fa
lastz
/path/to/lastz//lastz-distrib-1.04.03/src/lastz_32 ../reference_genome1.fasta[multiple] ../sample.genomic.fa --notransition --step=20 --nogapped --progress=1 --format=maf > lastzSample_ref1.txt
svmu
/path/to/svmu-master/svmu Sample_ref1 ../reference_genome1.fasta ../sample.genomic.fa 5 l lastzSample_ref1.txt svmuSample_ref1
The output for reference1 2.4K Aug 7 17:43 cm.lastzAD3_AD1.txt.txt 2.6G Aug 7 18:11 cnv_all.lastzAD3_AD1.txt.txt 2.4G Aug 7 18:11 cords.lastzAD3_AD1.txt.txt 0 Aug 7 18:11 small.lastzAD3_AD1.txt.txt 3.2K Aug 7 18:11 sv.lastzAD3_AD1.txt.txt
The output for reference2 3.7M Jul 19 02:59 cm.lastzAD3_AD2.txt.txt 631M Jul 19 03:02 cnv_all.lastzAD3_AD2.txt.txt 648M Jul 19 03:02 cords.lastzAD3_AD2.txt.txt 0 Jul 19 03:02 small.lastzAD3_AD2.txt.txt 11M Jul 19 03:37 sv.lastzAD3_AD2.txt.txt
The big difference is the file size for reference1, files cm.lastzAD3_AD1.txt.txt and sv.lastzAD3_AD1.txt.txt are small and there is no complete information for all the chromosomes. And cnv_all and cords are large. Actually, reference1 and reference2 genome size are near and sequence have a little high similarity. How could the result will be such different changing NUCMER parameters? How can I define accurate SVs for the sample with appropriate parameters?
Looking forward to hearing from you. Thanks a lot! Best wishes, Clement
Hi Clement,
I think the ref1 run did not complete properly. You can run both without -maxmatch. I have also pushed an updated today. Not sure if that'll be of any help to you but feel free to try the updated version. The new version works better when the maxmatch option is NOT used.
Hi Mahulchak, Thank you for your reply. The above mentioned reference genomes are larger than 2Gb. According to your updated information, the "-maxmatch " might be only suitable for small genomes, right? So I abandoned this parameters in later use. Thanks again!
yes. --maxmatch and --noextend may not work well for large genomes.