truvari icon indicating copy to clipboard operation
truvari copied to clipboard

merging different SV type?

Open leone93 opened this issue 1 year ago • 2 comments

Version : Truvari v4.2.2-dev 1cd03b2f4e8afdb3431595fa351501b36db3cfd8

Describe the bug :

I'm trying to use Truvari merging to merge a file containing sv calling coming from 4 different callers. Actually I did this thing with SURIVOR instead of bcftools merging because the latter has some problems putting together the different field coming from the 4 different caller. I did it in a very stringent mode to not do overmerging and leave this part to Truvari. I used Truvari collapse with --intra on the file and everything worked good. Then I checked the difference between the two vcf files using bcftools isec and I noticed that at least in this case Truvari seems to discard this data or, worse merging different SV type starting more or less in the same point.

To Reproduce : truvari collapse --intra --keep maxqual --gt het --chain -i sample_merged_10bp.sorted.vcf.gz -o truvari-dev_merge.vcf -c truvari-dev_collapsed.vcf -f /home/fabbial/reference/all.chrs.con.fasta

Expected behavior : A clear and concise description of what you expected to happen.

Example Data : If applicable/possible, add example data to help recreate your problem.

Additional context : Original file:

Chr10   15693   pbsv.INS.9527   G       GGTTAATGTGGACCCCCGTTTTTATAACAGGAATCAATCGTGTAAACAGTACCATTTCCCTGGATCAAGTAGTTTGGTACACACGAATTCATAGCTGGAATATCAAGTGCACATACACGAGTGTTTAATACGAATTATTACATCATAGGCCCGCAGGCATAGCCTTAAATCATCGGACCGAGAGGTCCGGAATACATCCAAAAGCAGAAATAGATAAGGCAGCGGAATTCAGGCAGCGGCATGCCATTGCCACAGGCAACGACCGTGCTAAGACCTACTGGACGCCATCGTCTTCATCTCCTTCCTGATAATAGGCATAGGGATCCTCCGAGGCTTCATCTCCCACTTCTGATAAATTTTATATTTGCAAGGATGAGTACCAACCGTACTCAGCAAGCCACCACAGCAACAATGCATATGAAAGGGGGAGTTCAAAGGATGGCCATAGTTCTTTTGCGCAAAGCAAGTTTTGTAATTCTTTTCACAAGCCTAAGACCTAGCATTGACTGATCAAATTTTTAGTACCAGAGTTTGTATTTAAACAACGACGGTTCTGTCCACCATCCATTGTGATCCCAAAGCTTCCCGCCATTGATTCGTCATGGTTTTCTGAGGACGTCCACCTTCCCGCCTCTCAGGAAGTGGCTCCAACAGCATAAAATTCATCATGCAATATCCCATCCCACACAAGTTAAGAATTTAGAGTCTAGCCAAGTGTAATACATGTCCCGGTGCTCAATAACCGCGAGCACGGCTATTCGAATAGGTTTGGTTTACTCACACTGCAGTGGATGTACACTTTACCCGCACTCCGCGACTGCCCAACACATGAGCCTCGTCCCAACACATGAGACGCGTCACGGCAAAGCTTTTCGATAACCTCGCATTGGCAGTACCCGCTCCAGGAACTTTTCATCCTCATGCACTCTAGGAATACACGGTTTCTAGCAGTGAGAGGAGTTCTGGCGCACCCGGGAAGGGAAGACTCACACATGCATTAAGTTATAATTATGTTTTAGATTCTCACATGGCAGTCCTACCGATGGCGACACCACTGTAGACACCCTCCTCGCGGTCCTACCAATGGCTGCCCCACCGTAGAGCCCCTGCCTCACACATCAAGAAACCACTATGCATGGATACTGCCTCCGCTCAGCTATCTACTCCGCTAGGTCTATACCCATACGAGAAGTGCGGTTGTACGGGGGTCGTTTCATGCTTAACCTCATGGCTCGGTCCTTAATTGACCAGGGACGGCACTAGCCTTTTCCGGACACCACCCAAGTCCTCCAGCCGCCCCAGTCGAAAACAGTTGTTTTACTTTATTTTCCTTTCACAAATTATGTCATCAATATCATGGCAATGTGGCGCTCATGTCTCCACATGCCGCATCTCAATTACCTTCCCAAAGGTAATTGCCCAAGCATATAGCATTTGATAAATATGAGTATGCATGAATCTAAAATAGCATTTCTAAGCAAGTGTCATAGTTGACTAGGGACTCGTACGTATCCATGGTTACAAAGATTTAAAGGTGAACAATAATCAAGGCATGGCATAATCACAAGTAGGAGGTTCATAATTGCATGCAATTTTATTTATAAACAAAAGAATTTCGCAATTGGGATCAACATGTTCAAGGAATAGTGATGACTTGCCTTGCTCGAGGTCTTGCGGGTCTTGGCCTTCACCTGGATCCGCGGCTCCCTCGGTCTCTATAGTTACGTGCGAAAATTGATTTGAATTCGGTTAGAATTCAAATAAAAATCCAAGTAAATCCGAATGGAAGTCGAACGCGAAAGTCAATTCCTTTTTATTAATTTTACTATCCGCGAACTATGGCAAACCCCAATTTTGGTTTAAATTATTTTGCGGTTACAATATTTTGTTGTCGTATGTTTTTAATTTAATCTACCCTAGCATTATATCCATATATTAGGTTAGAAATTTCTTATCGCGAGCTAAATGTCGGGCGGAATCCTAAAATTATCTTATAATATTATACGACTTAATTTAGTCTATGATTAAATATAATACACGGTTAACACCCTAGTAATTAAATCCGAATCGCTACCGTTGATCGATTATTTATAGAGATTACCCAGAAATAATCCACAATAATTTACGAGAATTCATACATTTGTTTAATTATTAATTACATCTAAATTAATACCGCGAATTGATTTCTTATGGAGAGTACTAAAAGTTATCTATAATCTATAGGAATTTATCCCATTATATTTTCATTAATCCTATATTTAAATGCTAGTATTTTAGGTATTTAATTAGTAAGAGTTTATCTAACTATATTTTCATTAATCCTACAATTAAATTCTAATATTTGCAAATATTTTAAATTTCCCCCTAAATTTTTCTTTCTCTTTTTCTTTCCCTTTTTCCTCTCCTCTCTTTTCTTTTCTTTTTTCTTTCTCTCCCTTTCTTTTCTTTTCTCTCCTGGCTTCCTCCTCCCTTCCTCTCCTTCTCTTTCTCGGCTCTCCCTCTCTTTCTCTCGGCTCTCTCTCTTTCCCACCCGAGCGGTGGCGGCGGCGGACTCGAGGGAAACGGAGGGCGACTCATCGGCGGCGGCAGCGGCGATGACGGCGGCAGCGACGGCGGCGCACGGTGGCGCGAGAACGGCGGTGCGGGCACGGAAACGGTCGGCGCGGCAGCACGACGGCGGCGGTGCGGCGGCACTACGATGGCGGCGCGACGGCGTGGAGGAGTGGGTGGTGGAGACGGGGAGAAGGGGGGGGGAATAGTGAGGCTTTTATAGGGGAAGGGAGAGAGATAAGGGGGAAGGAGGAGGAGGGAAGAGGGAAGAGGGGAAGAGGGGAAGAAGAAGAGGGGAAAGAGAGGAGGGGGAGGGGCGGCGACGCGGCTGCGGTGACGGGATGGCGGCGCGGGGCTCGGCGCGCGGCGGGACGCGAGACGCGACGGCGACGGATGAGCGGCGACGGGACGGCGACGCGACGGGCGACGGCGCGCGGCGATGGCGACGAGCGGGCGGCGCGGCGCGGGGCTCGGAGCGGCTCGGCGCGGGAGGGGACGGCGACGCGACGGGCGGTGACGGGGTGCGTGGGCGCGCGGGGCGAGGTGGCAGGCGGCGATGGGACGGCGACGCGACACGGGACGGCGACGCGACGGGCGGTGACGGGGTGCGTGGGCGCGCGGGGCGAGGTGGCAGGCGGCGATGGGACGGCGACGCGACACGGGACGGCGACGCGACGGCGACGGTGATGCGACGGCGAGCGACGCGGCAGAGGGGAGGCGGAGCGGCGCGCGTGGCAGGGGAGGGGAAAGGCTGGGGACCAGGTCGAACACGTGGCGGGCAATGAACAGTGCACTTTTCCAATAAACCGATTTTAGAGTGTTTCTCATATGAATTTGATTCCGAAATTCTTAATTTTTTGCATAAATGAAGTTTTACCCCATATTTATATTATTCTAACTAAAGATTCACCTAATTTAATATCACTCATATTTTGTTTATATAATTCATTTGAATTTTTAATTAAAGTTAATTCTCATTCCATCGTATTAAAATTTAATTGTTGTTAATATGGTTGCGATAACATTTTATTTATTTCCAAACCCACCTAATCTTTATTTTAATTTATATTTTAATTATTTATTTAGCCCACTTGATTTTTAGGGTTTATTCCTAGTTAATTTCCTCCCATTTGTGATCGATGAAATCCGAAATCAAAATCCAATAAAATCTTCGAATAAAATTGGCATGATGCAATTTATTTAAAAAGTTTTTTTTTTTTTGAAGATCAGAATTTTTTTGGAGTCTTTGATTTTGTTGGTCGAATTTTCAGAATGTTACA  133     PASS    SUPP=3;SUPP_VEC=1011;SVLEN=3827;SVTYPE=INS;SVMETHOD=SURVIVOR1.0.7;CHR2=Chr10;END=15693;CIPOS=0,1;CIEND=-6,1;STRANDS=+-  GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO     1/1:NA:3823:1,15:+-:133:INS:cuteSV.INS.838:G:GGTTAATGTGGACCCCCGTTTTTATAACAGGAATCAATCGTGTAAACAGTACCATTTCCCTGGATCAAGTAGTTTGGTACACACGAATTCATAGCTGGAATATCAAGTGCACATACACGAGTGTTTAATACGAATTATTACATCATAGGCCGCAGGCATAGCCTTAAATCATCGGACCGAGAGGTCCGGAATACATCCAAAAGCAGAAATAGATAAGGCAGCGGAATTCAGGCAGCGGCATGCCATTGCCACAGGCAACGACCGTGCTAAGACCTACTGGACGCCATCGTCTTCATCTCCTTCCTGATAATAGGCATAGGGATCCTCCGAGGCTTCATCTCCCACTTCTGATAAATTTTATATTTGCAAGGATGAGTACCAACCGTACTCAGCAAGCCACCACAGCAACAATGCATATGAAAGGGGGAGTTCAAAGGATGGCCATAGTTCTTTTGCGCAAAGCAAGTTTTGTAATTCTTTTCACAAGCCTAAGACCTAGCATTGACTGATCAAATTTTTAGTACCAGAGTTTGTATTTAAACAACGACGGTTCTGTCCACCATCCATTGTGATCCCAAAGCTTCCCGCCATTGATTCGTCATGGTTTTCTGAGGACGTCCACCTTCCCGCCTCTCAGGAAGTGGCTCCAACAGCATAAAATTCATCATGCAATATCCCATCCCACACAAGTTAAGAATTTAGAGTCTAGCCAAGTGTAATACATGTCCCGGTGCTCAATAACCGCGAGCACGGCTATTCGAATAGGTTTGGTTTACTCACACTGCAGTGGATGTACACTTTACCCGCACTCCGCGACTGCCCAACACATGAGCCTCGTCCCAACACATGAGACGCGTCACGGCAAAGCTTTTCGATAACCTCGCATTGGCAGTACCCGCTCCAGGAACTTTTCATCCTCATGCACTCTAGGAATACACGGTTTCTAGCAGTGAGAGGAGTTCTGGCGCACCCGGGAAGGGAAGACTCACACATGCATTAAGTTATAATTATGTTTTAGATTCTCACATGGCAGTCCTACCGATGGCGACACCACTGTAGACACCCTCCTCGCGGTCCTACCAATGGCTGCCCCACCGTAGAGCCCCTGCCTCACACATCAAGAAACCACTATGCATGGATACTGCCTCCGCTCAGCTATCTACTCCGCTAGGTCTATACCCATACGAGAAGTGCGGTTGTACGGGGGTCGTTTCATGCTTAACCTCATGGCTCGGTCCTTAATTGACCAGGGACGGCACTAGCCTTTTCCGGACACCACCCAAGTCCTCCAGCCGCCCCAGTCGAAAACAGTTGTTTTACTTTATTTTCCTTTCACAAATTATGTCATCAATATCATGGCAATGTGGCGCTCATGTCTCCACATGCCGCATCTCAATTACCTTCCCAAAGGTAATTGCCCAAGCATATAGCATTTGATAAATATGAGTATGCATGAATCTAAAATAGCATTTCTAAGCAAGTGTCATAGTTGACTAGGGACTCGTACGTATCCATGGTTACAAAGATTTAAAGGTGAACAATAATCAAGGCATGGCATAATCACAAGTAGGAGGTTCATAATTGCATGCAATTTTATTTATAAACAAAAGAATTTCGCAATTGGGATCAACATGTTCAAGGAATAGTGATGACTTGCCTTGCTCGAGGTCTTGCGGGTCTTGGCCTTCACCTGGATCCGCGGCTCCCTCGGTCTCTATAGTTACGTGCGAAAATTGATTTGAATTCGGTTAGAATTCAAATAAAAATCCAAGTAAATCCGAATGGAAGTCGAACGCGAAAGTCAATTCCTTTTTATTAATTTTACTATCCGCGAACTATGGCAAACCCCAATTTTGGTTTAAATTATTTTGCGGTTACAATATTTTGTTGTCGTATGTTTTTAATTTAATCTACCCTAGCATTATATCCATATATTAGGTTAGAAATTTCTTATCGCGAGCTAAATGTCGGGCGGAATCCTAAAATTATCTTATAATATTATACGACTTAATTTAGTCTATGATTAAATATAATACACGGTTAACACCCTAGTAATTAAATCCGAATCGCTACCGTTGATCGATTATTTATAGAGATTACCCAGAAATAATCCACAATAATTTACGAGAATTCATACATTTGTTTAATTATTAATTACATCTAAATTAATACCGCGAATTGATTTCTTATGGAGAGTACTAAAAGTTATCTATAATCTATAGGAATTTATCCCATTATATTTTCATTAATCCTATATTTAAATGCTAGTATTTTAGGTATTTAATTAGTAAGAGTTTATCTAACTATATTTTCATTAATCCTACAATTAAATTCTAATATTTGCAAATATTTTAAATTTCCCCCTAAATTTTTCTTTCTCTTTTTCTTTCCCTTTTTCCTCTCCTCTCTTTTCTTTCTTTTTTCTTTCTCTCCCTTTCTTTTCTTTTCTCTCCTGGCTTCCTCCTCCCTTCCTCTCCTTCTCTTTCTCGGCTCTCCCTCTCTTTCTCTCGGCTCTCTCTCTTTCCCACCCGAGCGGTGGCGGCGGCGGACTCGAGGGAAACGGAGGGCGACTCATCGGCGGCGGCAGCGGCGATGACGGCGGCAGCGACGGCGGCGCACGGTGGCGCGAGAACGGCGGTGCGGGCACGGAAACGGTCGGCGCGGCAGCACGACGGCGGCGGTGCGGCGGCACTACGATGGCGGCGCGACGGCGTGGAGGAGTGGGTGGTGGAGACGGGGAGAAGGGGGGGGGAATAGTGAGGCTTTTATAGGGGAAGGGAGAGAGATAAGGGGGAAGGAGGAGGAGGGAAGAGGGAAGAGGGGAAGAGGGGAAGAAGAAGAGGGGAAAGAGAGAGGGGGAGGGGCGGCGACGCGGCTGCGGTGACGGGATGGCGGCGCGGGGCTCGGCGCGCGGCGGGACGCGAGACGCGACGGCGACGGATGAGCGGCGACGGGACGGCGACGCGACGGGCGACGGCGCGCGGCGATGGCGACGAGCGGGCGGCGCGGCGCGGGGCTCGGAGCGGCTCGGCGCGGGAGGGGACGGCGACGCGACGGGCGGTGACGGGGTGCGTGGGCGCGCGGGGCGAGGTGGCAGGCGGCGATGGGACGGCGACGCGACACGGGACGGCGACGCGACGGGCGGTGACGGGGTGCGTGGGCGCGCGGGGCGAGGTGGCAGGCGGCGATGGGACGGCGACGCGACACGGGACGGCGACGCGACGGCGACGGTGATGCGACGGCGAGCGACGCGGCAGAGGGGAGGCGGAGCGGCGCGCGTGGCAGGGGAGGGGAAAGGCTGGGGACCAGGTCGAACACGTGGCGGGCAATGAACAGTGCACTTTTCCAATAAACCGATTTTAGAGTGTTTCTCATATGAATTTGATTCCGAAATTCTTAATTTTTTGCATAAATGAAGTTTTACCCCATATTTATATTATTCTAACTAAAGATTCACCTAATTTAATATCACTCATATTTTGTTTATATAATTCATTTGAATTTTTAATTAAAGTTAATTCTCATTCCATCGTATTAAAATTTATTGTTGTTAATATGGTTGCGATAACATTTTATTTATTTCCAAACCCACCTAATCTTTATTTAATTTATATTTTAATTATTTATTTAGCCCACTTGATTTTTAGGGTTTATTCCTAGTTAATTTCCTCCCATTTGTGATCGATGAAATCCGAAATCAAAATCCAATAAAATCTTCGAATAAAATTGGCATGATGCAATTTATTTAAAAAGTTTTTTTTTTTTGAAGATCAGAATTTTTTTGGAGTCTTTGATTTTGTTGGTCGAATTTTCAGAATGTTACA:Chr10_15693-Chr10_15693   ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN        1/1:NA:3829:0,0:+-:.:INS:pbsv.INS.9527:G:GGTTAATGTGGACCCCCGTTTTTATAACAGGAATCAATCGTGTAAACAGTACCATTTCCCTGGATCAAGTAGTTTGGTACACACGAATTCATAGCTGGAATATCAAGTGCACATACACGAGTGTTTAATACGAATTATTACATCATAGGCCCGCAGGCATAGCCTTAAATCATCGGACCGAGAGGTCCGGAATACATCCAAAAGCAGAAATAGATAAGGCAGCGGAATTCAGGCAGCGGCATGCCATTGCCACAGGCAACGACCGTGCTAAGACCTACTGGACGCCATCGTCTTCATCTCCTTCCTGATAATAGGCATAGGGATCCTCCGAGGCTTCATCTCCCACTTCTGATAAATTTTATATTTGCAAGGATGAGTACCAACCGTACTCAGCAAGCCACCACAGCAACAATGCATATGAAAGGGGGAGTTCAAAGGATGGCCATAGTTCTTTTGCGCAAAGCAAGTTTTGTAATTCTTTTCACAAGCCTAAGACCTAGCATTGACTGATCAAATTTTTAGTACCAGAGTTTGTATTTAAACAACGACGGTTCTGTCCACCATCCATTGTGATCCCAAAGCTTCCCGCCATTGATTCGTCATGGTTTTCTGAGGACGTCCACCTTCCCGCCTCTCAGGAAGTGGCTCCAACAGCATAAAATTCATCATGCAATATCCCATCCCACACAAGTTAAGAATTTAGAGTCTAGCCAAGTGTAATACATGTCCCGGTGCTCAATAACCGCGAGCACGGCTATTCGAATAGGTTTGGTTTACTCACACTGCAGTGGATGTACACTTTACCCGCACTCCGCGACTGCCCAACACATGAGCCTCGTCCCAACACATGAGACGCGTCACGGCAAAGCTTTTCGATAACCTCGCATTGGCAGTACCCGCTCCAGGAACTTTTCATCCTCATGCACTCTAGGAATACACGGTTTCTAGCAGTGAGAGGAGTTCTGGCGCACCCGGGAAGGGAAGACTCACACATGCATTAAGTTATAATTATGTTTTAGATTCTCACATGGCAGTCCTACCGATGGCGACACCACTGTAGACACCCTCCTCGCGGTCCTACCAATGGCTGCCCCACCGTAGAGCCCCTGCCTCACACATCAAGAAACCACTATGCATGGATACTGCCTCCGCTCAGCTATCTACTCCGCTAGGTCTATACCCATACGAGAAGTGCGGTTGTACGGGGGTCGTTTCATGCTTAACCTCATGGCTCGGTCCTTAATTGACCAGGGACGGCACTAGCCTTTTCCGGACACCACCCAAGTCCTCCAGCCGCCCCAGTCGAAAACAGTTGTTTTACTTTATTTTCCTTTCACAAATTATGTCATCAATATCATGGCAATGTGGCGCTCATGTCTCCACATGCCGCATCTCAATTACCTTCCCAAAGGTAATTGCCCAAGCATATAGCATTTGATAAATATGAGTATGCATGAATCTAAAATAGCATTTCTAAGCAAGTGTCATAGTTGACTAGGGACTCGTACGTATCCATGGTTACAAAGATTTAAAGGTGAACAATAATCAAGGCATGGCATAATCACAAGTAGGAGGTTCATAATTGCATGCAATTTTATTTATAAACAAAAGAATTTCGCAATTGGGATCAACATGTTCAAGGAATAGTGATGACTTGCCTTGCTCGAGGTCTTGCGGGTCTTGGCCTTCACCTGGATCCGCGGCTCCCTCGGTCTCTATAGTTACGTGCGAAAATTGATTTGAATTCGGTTAGAATTCAAATAAAAATCCAAGTAAATCCGAATGGAAGTCGAACGCGAAAGTCAATTCCTTTTTATTAATTTTACTATCCGCGAACTATGGCAAACCCCAATTTTGGTTTAAATTATTTTGCGGTTACAATATTTTGTTGTCGTATGTTTTTAATTTAATCTACCCTAGCATTATATCCATATATTAGGTTAGAAATTTCTTATCGCGAGCTAAATGTCGGGCGGAATCCTAAAATTATCTTATAATATTATACGACTTAATTTAGTCTATGATTAAATATAATACACGGTTAACACCCTAGTAATTAAATCCGAATCGCTACCGTTGATCGATTATTTATAGAGATTACCCAGAAATAATCCACAATAATTTACGAGAATTCATACATTTGTTTAATTATTAATTACATCTAAATTAATACCGCGAATTGATTTCTTATGGAGAGTACTAAAAGTTATCTATAATCTATAGGAATTTATCCCATTATATTTTCATTAATCCTATATTTAAATGCTAGTATTTTAGGTATTTAATTAGTAAGAGTTTATCTAACTATATTTTCATTAATCCTACAATTAAATTCTAATATTTGCAAATATTTTAAATTTCCCCCTAAATTTTTCTTTCTCTTTTTCTTTCCCTTTTTCCTCTCCTCTCTTTTCTTTTCTTTTTTCTTTCTCTCCCTTTCTTTTCTTTTCTCTCCTGGCTTCCTCCTCCCTTCCTCTCCTTCTCTTTCTCGGCTCTCCCTCTCTTTCTCTCGGCTCTCTCTCTTTCCCACCCGAGCGGTGGCGGCGGCGGACTCGAGGGAAACGGAGGGCGACTCATCGGCGGCGGCAGCGGCGATGACGGCGGCAGCGACGGCGGCGCACGGTGGCGCGAGAACGGCGGTGCGGGCACGGAAACGGTCGGCGCGGCAGCACGACGGCGGCGGTGCGGCGGCACTACGATGGCGGCGCGACGGCGTGGAGGAGTGGGTGGTGGAGACGGGGAGAAGGGGGGGGGAATAGTGAGGCTTTTATAGGGGAAGGGAGAGAGATAAGGGGGAAGGAGGAGGAGGGAAGAGGGAAGAGGGGAAGAGGGGAAGAAGAAGAGGGGAAAGAGAGGAGGGGGAGGGGCGGCGACGCGGCTGCGGTGACGGGATGGCGGCGCGGGGCTCGGCGCGCGGCGGGACGCGAGACGCGACGGCGACGGATGAGCGGCGACGGGACGGCGACGCGACGGGCGACGGCGCGCGGCGATGGCGACGAGCGGGCGGCGCGGCGCGGGGCTCGGAGCGGCTCGGCGCGGGAGGGGACGGCGACGCGACGGGCGGTGACGGGGTGCGTGGGCGCGCGGGGCGAGGTGGCAGGCGGCGATGGGACGGCGACGCGACACGGGACGGCGACGCGACGGGCGGTGACGGGGTGCGTGGGCGCGCGGGGCGAGGTGGCAGGCGGCGATGGGACGGCGACGCGACACGGGACGGCGACGCGACGGCGACGGTGATGCGACGGCGAGCGACGCGGCAGAGGGGAGGCGGAGCGGCGCGCGTGGCAGGGGAGGGGAAAGGCTGGGGACCAGGTCGAACACGTGGCGGGCAATGAACAGTGCACTTTTCCAATAAACCGATTTTAGAGTGTTTCTCATATGAATTTGATTCCGAAATTCTTAATTTTTTGCATAAATGAAGTTTTACCCCATATTTATATTATTCTAACTAAAGATTCACCTAATTTAATATCACTCATATTTTGTTTATATAATTCATTTGAATTTTTAATTAAAGTTAATTCTCATTCCATCGTATTAAAATTTAATTGTTGTTAATATGGTTGCGATAACATTTTATTTATTTCCAAACCCACCTAATCTTTATTTTAATTTATATTTTAATTATTTATTTAGCCCACTTGATTTTTAGGGTTTATTCCTAGTTAATTTCCTCCCATTTGTGATCGATGAAATCCGAAATCAAAATCCAATAAAATCTTCGAATAAAATTGGCATGATGCAATTTATTTAAAAAGTTTTTTTTTTTTTGAAGATCAGAATTTTTTTGGAGTCTTTGATTTTGTTGGTCGAATTTTCAGAATGTTACA:Chr10_15693-Chr10_15693 1/1:NA:3829:0,0:+-:57:INS:Sniffles2.INS.0S9:N:GTTAATGTGGACCCCCGTTTTTATAACAGGAATCAATCGTGTAAACAGTACCATTTCCCTGGATCAAGTAGTTTGGTACACACGAATTCATAGCTGGAATATCAAGTGCACATACACGAGTGTTTAATACGAATTATTACATCATAGGCCCGCAGGCATAGCCTTAAATCATCGGACCGAGAGGTCCGGAATACATCCAAAAGCAGAAATAGATAAGGCAGCGGAATTCAGGCAGCGGCATGCCATTGCCACAGGCAACGACCGTGCTAAGACCTACTGGACGCCATCGTCTTCATCTCCTTCCTGATAATAGGCATAGGGATCCTCCGAGGCTTCATCTCCCACTTCTGATAAATTTTATATTTGCAAGGATGAGTACCAACCGTACTCAGCAAGCCACCACAGCAACAATGCATATGAAAGGGGGAGTTCAAAGGATGGCCATAGTTCTTTTGCGCAAAGCAAGTTTTGTAATTCTTTTCACAAGCCTAAGACCTAGCATTGACTGATCAAATTTTTAGTACCAGAGTTTGTATTTAAACAACGACGGTTCTGTCCACCATCCATTGTGATCCCAAAGCTTCCCGCCATTGATTCGTCATGGTTTTCTGAGGACGTCCACCTTCCCGCCTCTCAGGAAGTGGCTCCAACAGCATAAAATTCATCATGCAATATCCCATCCCACACAAGTTAAGAATTTAGAGTCTAGCCAAGTGTAATACATGTCCCGGTGCTCAATAACCGCGAGCACGGCTATTCGAATAGGTTTGGTTTACTCACACTGCAGTGGATGTACACTTTACCCGCACTCCGCGACTGCCCAACACATGAGCCTCGTCCCAACACATGAGACGCGTCACGGCAAAGCTTTTCGATAACCTCGCATTGGCAGTACCCGCTCCAGGGAACTTTTCATCCTCATGCACTCTAGGAATACACGGTTTCTAGCAGTGAGAGGAGTTCTGGCGCACCCGGGAAGGGAAGACTCACACATGCATTAAGTTATAATTATGTTTTAGATTCTCACATGGCAGTCCTACCGATGGCGACACCACTGTAGACACCCTCCTCGCGGTCCTACCAATGGCTGCCCCACCGTAGAGCCCCTGCCTCACACATCAAGAAACCACTATGCATGGATACTGCCTCCGCTCAGCTATCTACTCCGCTAGGTCTATACCCATACGAGAAGTGCGGTTGTACGGGGGTCGTTTCATGCTTAACCTCATGGCTCGGTCCTTAATTGACCAGGGACGGCACTAGCCTTTTCCGGACACCACCCAAGTCCTCCAGCCGCCCCAGTCGAAAACAGTTGTTTTACTTTATTTTCCTTTCACAAATTATGTCATCAATATCATGGCAATGTGGCGCTCATGTCTCCACATGCCGCATCTCAATTACCTTCCCAAAGGTAATTGCCCAAGCATATAGCATTTGATAAATATGAGTATGCATGAATCTAAAATAGCATTTCTAAGCAAGTGTCATAGTTGACTAGGGACTCGTACGTATCCATGGTTACAAAGATTTAAAGGTGAACAATAATCAAGGCATGGCATAATCACAAGTAGGAGGTTCATAATTGCATGCAATTTTATTTATAAACAAAAGAATTTCGCAATTGGGATCAACATGTTCAAGGAATAGTGATGACTTGCCTTGCTCGAGGTCTTGCGGGTCTTGGCCTTCACCTGGATCCGCGGCTCCCTCGGTCTCTATAGTTACGTGCGAAAATTGATTTGAATTCGGTTAGAATTCAAATAAAAATCCAAGTAAATCCGAATGGAAGTCGAACGCGAAAGTCAATTCCTTTTTATTAATTTTACTATCCGCGAACTATGGCAAACCCCAATTTTGGTTTAAATTATTTTGCGGTTACAATATTTTGTTGTCGTATGTTTTTAATTTAATCTACCCTAGCATTATATCCATATATTAGGTTAGAAATTTCTTATCGCGAGCTAAATGTCGGGCGGAATCCTAAAATTATCTTATAATATTATACGACTTAATTTAGTCTATGATTAAATATAATACACGGTTAACACCCTAGTAATTAAATCCGAATCGCTACCGTTGATCGATTATTTATAGAGATTACCCAGAAATAATCCACAATAATTTACGAGAATTCATACATTTGTTTAATTATTAATTACATCTAAATTAATACCGCGAATTGATTTCTTATGGAGAGTACTAAAAGTTATCTATAATCTATAGGAATTTATCCCATTATATTTTCATTAATCCTATATTTAAATGCTAGTATTTTAGGTATTTAATTAGTAAGAGTTTATCTAACTATATTTTCATTAATCCTACAATTAAATTCTAATATTTGCAAATATTTTAAATTTCCCCCTAAATTTTTCTTTCTCTTTTTCTTTCCCTTTTTCCTCTCCTCTCTTTTCTTTCTTTTTTCTTTCTCTCCCTTTCTTTTCTTTTCTCTCCTGGCTTCCTCCTCCCTTCCTCTCCTTCTCTTTCTCGGCTCTCCCTCTCTTTCTCTCGGCTCTCTCTCTTTCCCACCCGAGCGGTGGCGGCGGCGGACTCGAGGGAAACGGAGGGCGACTCATCGGCGGCGGCAGCGGCGATGACGGCGGCAGCGACGGCGGCGCACGGTGGCGCGAGAACGGCGGTGCGGGCACGGAAACGGTCGGCGCGGCAGCACGACGGCGGCGGTGCGGCGGCACTACGATGGCGGCGCGACGGCGTGGAGGAGTGGGTGGTGGAGACGGGGAGAAGGGGGGGGAATAGTGAGGCTTTTATAGGGGAAGGGAGAGAGATAAGGGGGAAGGAGGAGGAGGGAAGAGGGAAGAGGGGAAGAGGGGAAGAAGAAGAGGGGAAAGAGAGGAGGGGGAGGGGCGGCGACGCGGCTGCGGTGACGGGATGGCGGCGCGGGGCTCGGCGCGCGGCGGGACGCGAGACGCGACGGCGACGGATGAGCGGCGACGGGACGGCGACGCGACGGGCGACGGCGCGCGGCGATGGCGACGAGCGGGCGGCGCGGCGCGGGGCTCGGAGCGGCTCGGCGCGGGAGGGGACGGCGACGCGACGGGCGGTGACGGGGTGCGTGGGCGCGCGGGGCGAGGTGGCAGGCGGCGATGGGACGGCGACGCGACACGGGACGGCGACGCGACGGGCGGTGACGGGGTGCGTGGGCGCGCGGGGCGAGGTGGCAGGCGGCGATGGGACGGCGACGCGACACGGGACGGCGACGCGACGGCGACGGTGATGCGACGGCGAGCGACGCGGCAGAGGGGAGGCGGAGCGGCGCGCGTGGCAGGGGAGGGGAAAGGCTGGGGACCAGGTCGAACACGTGGCGGGCAATGAACAGTGCACTTTTCCAATAAACCGATTTTAGAGTGTTTCTCATATGAATTTGATTCCGAAATTCTTAATTTTTTGCATAAATGAAGTTTTACCCCATATTTATATTATTCTAACTAAAGATTCACCTAATTTAATATCACTCATATTTTGTTTATATAATTCATTTGAATTTTTAATTAAAGTTAATTCTCATTCCATCGTATTAAAATTTAATTGTTGTTAATATGGTTGCGATAACATTTTATTTATTTCCAAACCCACCTAATCTTTATTTTAATTTATATTTTAATTATTTATTTAGCCCACTTGATTTTTAGGGTTTATTCCTAGTTAATTTCCTCCCATTTGTGATCGATGAAATCCGAAATCAAAATCCAATAAAATCTTCGAATAAAATTGGCATGATGCAATTTATTTAAAAAGTTTTTTTTTTTTTTGAAGATCAGAATTTTTTTGGAGTCTTTGATTTTGTTGGTCGAATTTTCAGAATGTTACA:Chr10_15694-Chr10_15694
Chr10   15694   svim.BND.21671  N       N[Chr5:29772388[        13      PASS    SUPP=2;SUPP_VEC=1100;SVLEN=0;SVTYPE=TRA;SVMETHOD=SURVIVOR1.0.7;CHR2=Chr5;END=29772387;CIPOS=0,0;CIEND=0,1;STRANDS=++    GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO     0/1:NA:29756694:22,10:++:13:TRA:cuteSV.BND.1288:NA:NA:Chr10_15694-Chr5_29772388 ./.:NA:29756693:0,0:++:11:TRA:svim.BND.21671:NA:NA:Chr10_15694-Chr5_29772387    ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN        ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN
Chr10   15699   svim.BND.21672  N       N[Chr5:29768562[        56      PASS    SUPP=3;SUPP_VEC=1101;SVLEN=0;SVTYPE=TRA;SVMETHOD=SURVIVOR1.0.7;CHR2=Chr5;END=29768562;CIPOS=0,1;CIEND=-1,0;STRANDS=++   GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO     0/1:NA:29752862:22,10:++:13:TRA:cuteSV.BND.1289:NA:NA:Chr10_15700-Chr5_29768562 ./.:NA:29752863:0,0:++:8:TRA:svim.BND.21672:NA:NA:Chr10_15699-Chr5_29768562     ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN        0/1:NA:29752861:0,0:++:56:TRA:Sniffles2.BND.635S9:NA:NA:Chr10_15700-Chr5_29768561

truvari merged file: only the insertion remain, the two traslocation are discarded

Thanks for your work

leone93 avatar Mar 17 '24 15:03 leone93

Hello,

I am unable to recreate this issue. When I put the following input through truvari collapse -i file.vcf.gz -o out.vcf using v4.2.2-dev

##fileformat=VCFv4.2
##contig=<ID=Chr10,length=248956422,md5=6aef897c3d6ff0c78aff06ac189178dd>
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=SUPP,Number=1,Type=String,Description="Variant ID">
##INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="Variant ID">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Variant type">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Variant length">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA24385	2:NA24385	3:NA24385	four
Chr10	15693	pbsv.INS.9527	G	GGTTAATGTGGACCCCCGTTTTT	133	PASS	SUPP=3;SUPP_VEC=1011;SVLEN=3827;SVTYPE=INS	GT	1/1	./.	1/1	1/1
Chr10	15694	svim.BND.21671	N	N[Chr5:29772388[	13	PASS	SUPP=2;SUPP_VEC=1100;SVLEN=0;SVTYPE=TRA	GT	0/1	./.	./.	./.
Chr10	15699	svim.BND.21672	N	N[Chr5:29768562[	56	PASS	SUPP=3;SUPP_VEC=1101;SVLEN=0;SVTYPE=TRA	GT	0/1	./.	./.	0/1

I get the output

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=Chr10,length=248956422,md5=6aef897c3d6ff0c78aff06ac189178dd>
##INFO=<ID=SUPP,Number=1,Type=String,Description="Variant ID">
##INFO=<ID=SUPP_VEC,Number=1,Type=String,Description="Variant ID">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Variant type">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Variant length">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=NumCollapsed,Number=1,Type=Integer,Description="Number of calls collapsed into this call by truvari">
##INFO=<ID=CollapseId,Number=1,Type=String,Description="Truvari uid to help tie output.vcf and output.collapsed.vcf entries together">
##INFO=<ID=NumConsolidated,Number=1,Type=Integer,Description="Number of samples consolidated into this call by truvari">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA24385	2:NA24385	3:NA24385	four
Chr10	15693	pbsv.INS.9527	G	GGTTAATGTGGACCCCCGTTTTT	133	PASS	SUPP=3;SUPP_VEC=1011;SVLEN=3827;SVTYPE=INS	GT	1/1	./.	1/1	1/1
Chr10	15694	svim.BND.21671	N	N[Chr5:29772388[	13	PASS	SUPP=2;SUPP_VEC=1100;SVLEN=0;SVTYPE=TRA	GT	0/1	./.	./.	./.
Chr10	15699	svim.BND.21672	N	N[Chr5:29768562[	56	PASS	SUPP=3;SUPP_VEC=1101;SVLEN=0;SVTYPE=TRA	GT	0/1	./.	./.	0/1

Note that I had to remove most of the INFO/FORMAT fields to make this test file. So it is possible the INFO/FORMAT fields are somehow stopping the BNDs from being processed? Perhaps the same fields that were preventing bcftools from running on the input?

ACEnglish avatar Mar 18 '24 13:03 ACEnglish

I'm trying to build a parser to simplify the SURVIVOR output in order to make it more similar to the one from bcftools (and your actual example). the problem of bcftools with merging with multiple callers (pbsv, svim, sniffles and cutesv) is that some fields are represented in different ways (and SURVIVOR is more elastic with that). I will replay again when I would have tried truvari with a simplified version. Even the one you added here. Thank you Adam for the help!

leone93 avatar Mar 18 '24 16:03 leone93

sounds good. Just know that SURVIVOR being elastic with the fields doesn't necessarily mean it is handling them properly. It doesn't have the most stringent VCF handling. It may be worthwhile to work on some pre-processing of the individual callers' results to make the headers and fields compatible before being fed into bcftools.

ACEnglish avatar Mar 18 '24 18:03 ACEnglish