AnnotSV icon indicating copy to clipboard operation
AnnotSV copied to clipboard

Issue for VCF conversion when two translocations start from the same position

Open nvnieuwk opened this issue 1 year ago • 2 comments

Hi, I have a VCF that contains two translocations that start at the same position in the genome. This causes some issues when trying to convert the TSV to VCF with variantconvert because both of these translocations have the same ID. Following error is created:

$ cat 20240220_AnnotSV/annot_test.variantconvert.log

python3 /usr/local/share/python3/variantconvert//variantconvert convert -i 20240220_AnnotSV/annot_test.tsv -o 20240220_AnnotSV/annot_test.vcf -fi annotsv -fo vcf -c /usr/local/share/python3/variantconvert//configs/GRCh38/annotsv3_from_vcf.json

2024-02-20 10:25:52 [INFO] running variantconvert 1.2.2
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/share/python3/variantconvert//variantconvert/__main__.py", line 226, in <module>
    main()
  File "/usr/local/share/python3/variantconvert//variantconvert/__main__.py", line 209, in main
    main_convert(args)
  File "/usr/local/share/python3/variantconvert//variantconvert/__main__.py", line 74, in main_convert
    converter.convert(args.inputFile, args.outputFile)
  File "/usr/local/share/python3/variantconvert//variantconvert/converters/vcf_from_annotsv.py", line 463, in convert
    info_dic = self._build_info_dic()
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/share/python3/variantconvert//variantconvert/converters/vcf_from_annotsv.py", line 219, in _build_info_dic
    merged_annots = self._merge_full_and_split(df_variant)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/share/python3/variantconvert//variantconvert/converters/vcf_from_annotsv.py", line 171, in _merge_full_and_split
    raise ValueError(
ValueError: Each variant is assumed to only have one single line of 'full' annotation

This error has been really confusing since these were different variants. I think this can be easily solved by adding some unique identifier at the end of the ID (A number stating which variant this is should be enough to make all IDs unique). Does this sound like a good/feasible solution to you?

You can find my full logs and input VCF below this issue and reproduce it using the following command:

AnnotSV -outputFile annot_test.tsv -SVinputFile annotsv_issue.vcf -vcf 1

Thanks! -Nicolas

Input VCF: annotsv_issue.vcf.gz

Output folder: 20240220_AnnotSV.tar.gz

nvnieuwk avatar Feb 20 '24 10:02 nvnieuwk

Hi,

AnnotSV considers that a translocation consists of a pair of 2 breakends. With <TRA> angle-bracketed notation, AnnotSV returns only 1 full annotation for the breakend of the pair described with "#CHROM/POS/ALT" (to be improved).

Here are your 2 TRA:

#CHROM  POS             ID                     REF       ALT   QUAL     FILTER   INFO                                                                           FORMAT          sample1
chrX    83731873        0_delly_TRA_35224       T       <TRA>   46      LowQual CHR2=chr1;CIEND=-563,563;CIPOS=-563,563;END=102454994;SVLEN=1;SVTYPE=TRA        GT:PE:SR        0/1:7,2:0,0
chrX    83731873        0_delly_TRA_35225       T       <TRA>   69      LowQual CHR2=chr1;CIEND=-563,563;CIPOS=-563,563;END=111581087;SVLEN=1;SVTYPE=TRA        GT:PE:SR        0/1:6,3:0,0
  • AnnotSV annotates only the chrX:83731873 breakend (2 times, but it is the same breakend). => X_83731310_83732436_TRA_1 (CIEND=-563,563;CIPOS=-563,563)
  • AnnotSV does not annotate the chr1:102454994 and chr1:111581087 breakends.

I keep in mind:

  • to make all IDs unique
  • to add the parsing of CHR2=chr1;END=102454994 with angle-bracketed notation (currently, AnnotSV does not annotate this breakend). Unfortunately, it would not be possible in a near future.

Best,

Véronique

lgmgeo avatar Feb 20 '24 13:02 lgmgeo

Thank you for the thorough answer, I'll try to find a workaround in the meantime

nvnieuwk avatar Feb 20 '24 13:02 nvnieuwk