samplot icon indicating copy to clipboard operation
samplot copied to clipboard

svtype = variant.info.get("SVTYPE", "SV")

Open Jokendo-collab opened this issue 1 year ago • 6 comments

I am getting the following error and I am not sure how to go around this. I used the following command to annotate the my vcf.

duphold -v macope2_sorted.vcf -b ../macOpe2.sorted.bam -f /data/okendojo/datashare/macOpeProject/macOpe2Assembly.fasta -t 24 -o mc.vcf but I cannot get the SVTYPE column information.

(samplot) [okendojo@cn0798 bcgFile]$ samplot vcf --vcf mc.vcf  -d test -O png -b ../macOpe2.sorted.bam 
Traceback (most recent call last):
  File "/vf/users/okendojo/conda/envs/samplot/bin/samplot", line 10, in <module>
    sys.exit(main())
  File "/vf/users/okendojo/conda/envs/samplot/lib/python3.10/site-packages/samplot/__main__.py", line 31, in main
    args.func(parser, args, extra_args)
  File "/vf/users/okendojo/conda/envs/samplot/lib/python3.10/site-packages/samplot/samplot_vcf.py", line 1133, in vcf
    commands, table_data = generate_commands(
  File "/vf/users/okendojo/conda/envs/samplot/lib/python3.10/site-packages/samplot/samplot_vcf.py", line 949, in generate_commands
    svtype = variant.info.get("SVTYPE", "SV")
  File "pysam/libcbcf.pyx", line 2711, in pysam.libcbcf.VariantRecordInfo.get
ValueError: Invalid header

Jokendo-collab avatar May 26 '23 18:05 Jokendo-collab

File "/vf/users/okendojo/conda/envs/samplot/lib/python3.10/site-packages/samplot/samplot_vcf.py", line 949, in >generate_commands
  svtype = variant.info.get("SVTYPE", "SV")
File "pysam/libcbcf.pyx", line 2711, in pysam.libcbcf.VariantRecordInfo.get
ValueError: Invalid header

Seems like the VCF might not bee formatted correctly, specifically the header it seems to be lacking the SVTYPE INFO field. You should see a line with something like this:

##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">

If this does not appear in your VCF you should add this, for example using bcftools you could do. Something like this:

  1. Get current header
bcftools view -h mc.vcf > header.txt
  1. Modify header using your text editor of choise to add the missing line (you can copy the line above)
  2. Replace the current VCF header
bcftools reheader -h new_header.txt mc.vcf > mc.reformat.vcf

If this does not work it would be good to look at actual VCF. Would it be possible to share the VCF or a subset of it here?

pontushojer avatar May 27 '23 09:05 pontushojer

@pontushojer Thanks for getting back to me. I tried this and it did not work. I am sending my sorted bam and vcf for you to have a look and give it a try. The files can be downloaded from:

  1. VCF https://hpc.nih.gov/~okendojo/output_file.vcf.gz
  2. BAM https://hpc.nih.gov/~okendojo/macOpe2.sorted.bam Give it a try and let me know how it goes

Jokendo-collab avatar May 27 '23 14:05 Jokendo-collab

Thanks for providing the VCF!

I see the issue now. The VCF you provide does not contain any structural variants, only short (<60 bp) variant calls from Dragen as far as I can tell. These are not appropriate to visualise with samplot as it relies on read-level information for most of its information. These short variants are more appropriately visualise in at a base-level using something like e.g. IGV.

Related to this, I would be useful if samplot could give a more useful error message here. It should check that the VCF contains structural variants, i.e. records with the INFO/SVTYPE tag.

pontushojer avatar May 28 '23 09:05 pontushojer

Do you think if I run the standard GATK on the sorted BAM file I will be able to get the SV information. Or which tool do you suggest I use in this case because we need these SVs. Let me know

On Sun, May 28, 2023, 05:16 Pontus Höjer @.***> wrote:

Thanks for providing the VCF!

I see the issue now. The VCF you provide does not contain any structural variants, only short (<60 bp) variant calls from Dragen as far as I can tell. These are not appropriate to visualise with samplot as it relies on read-level information for most of its information. These short variants are more appropriately visualise in at a base-level using something like e.g. IGV.

Related to this, I would be useful if samplot could give a more useful error message here. It should check that the VCF contains structural variants, i.e. records with the INFO/SVTYPE tag.

— Reply to this email directly, view it on GitHub https://github.com/ryanlayer/samplot/issues/181#issuecomment-1566026423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGJ34OZ3B42M5DDNFZ35XU3XIMJVVANCNFSM6AAAAAAYQSCDNI . You are receiving this because you authored the thread.Message ID: @.***>

Jokendo-collab avatar May 28 '23 11:05 Jokendo-collab

If by "standard GATK" you mean "HaplotypeCaller" then no. For structural variants other callers are required, examples for short reads are smoove and manta. If you are allready using Dragen for short variant maybe checkout the Dragen SV caller. Or try googling a bit, there is plenty to choose from.

pontushojer avatar May 28 '23 11:05 pontushojer

@pontushojer thanks for the pointers. I was able to use manta and got the right vcf file.

Jokendo-collab avatar May 30 '23 17:05 Jokendo-collab