svmu icon indicating copy to clipboard operation
svmu copied to clipboard

Loss of chr information in cm.lastz.txt and sv.lastz.txt

Open clementmo opened this issue 4 years ago • 3 comments

Hello mahulchak, After I run svmu with a genome with two reference genomes, the out put of cm.lastz.txt and sv.lastz.txt shows no complete chromosomes for one reference genome .

The code for lastz and svmu are the same, but different in nucmer, as shown in the following.

NUCMER for refer1

nucmer --maxmatch --noextend --prefix=Sample_ref1 ../reference_genome1.fasta ../sample.genomic.fa

NUCMER for refer2

nucmer --prefix=Sample_ref2 ../reference_genome2.fasta ../sample.genomic.fa

lastz

/path/to/lastz//lastz-distrib-1.04.03/src/lastz_32 ../reference_genome1.fasta[multiple] ../sample.genomic.fa --notransition --step=20 --nogapped --progress=1 --format=maf > lastzSample_ref1.txt

svmu

/path/to/svmu-master/svmu Sample_ref1 ../reference_genome1.fasta ../sample.genomic.fa 5 l lastzSample_ref1.txt svmuSample_ref1

The output for reference1 2.4K Aug 7 17:43 cm.lastzAD3_AD1.txt.txt 2.6G Aug 7 18:11 cnv_all.lastzAD3_AD1.txt.txt 2.4G Aug 7 18:11 cords.lastzAD3_AD1.txt.txt 0 Aug 7 18:11 small.lastzAD3_AD1.txt.txt 3.2K Aug 7 18:11 sv.lastzAD3_AD1.txt.txt

The output for reference2 3.7M Jul 19 02:59 cm.lastzAD3_AD2.txt.txt 631M Jul 19 03:02 cnv_all.lastzAD3_AD2.txt.txt 648M Jul 19 03:02 cords.lastzAD3_AD2.txt.txt 0 Jul 19 03:02 small.lastzAD3_AD2.txt.txt 11M Jul 19 03:37 sv.lastzAD3_AD2.txt.txt

The big difference is the file size for reference1, files cm.lastzAD3_AD1.txt.txt and sv.lastzAD3_AD1.txt.txt are small and there is no complete information for all the chromosomes. And cnv_all and cords are large. Actually, reference1 and reference2 genome size are near and sequence have a little high similarity. How could the result will be such different changing NUCMER parameters? How can I define accurate SVs for the sample with appropriate parameters?

Looking forward to hearing from you. Thanks a lot! Best wishes, Clement

clementmo avatar Aug 07 '20 16:08 clementmo

Hi Clement,

I think the ref1 run did not complete properly. You can run both without -maxmatch. I have also pushed an updated today. Not sure if that'll be of any help to you but feel free to try the updated version. The new version works better when the maxmatch option is NOT used.

mahulchak avatar Aug 08 '20 05:08 mahulchak

Hi Mahulchak, Thank you for your reply. The above mentioned reference genomes are larger than 2Gb. According to your updated information, the "-maxmatch " might be only suitable for small genomes, right? So I abandoned this parameters in later use. Thanks again!

clementmo avatar Aug 11 '20 03:08 clementmo

yes. --maxmatch and --noextend may not work well for large genomes.

mahulchak avatar Aug 11 '20 05:08 mahulchak