SnpEff icon indicating copy to clipboard operation
SnpEff copied to clipboard

Discrepancy of count for intron_variant effect and INTRON region

Open maggs-x opened this issue 1 year ago • 0 comments

Describe the issue Hi, I recently ran snpeff on a vcf generated from mini graph-cactus pangenome pipeline. I noticed that the intron_variant count and INTRON count in the regional table don't have the same value. Below is the screenshot.

Screenshot 2024-02-13 at 5 30 26 PM

To Reproduce

  1. SnpEff version: SnpEff 5.2a

  2. Genome version: Astyanax mexicanus version 3 (I built a database in snpeff using the dataset found here: https://ftp.ensembl.org/pub/rapid-release/species/Astyanax_mexicanus/GCA_023375975.1/ensembl/). I called it Amex3

  3. SnpEff full command line: java -Xmx8g -jar /storage/hpc/group/warrenlab/users/maggsx/snpEff/snpEff.jar Amex3 Amex-pg_wave_clean_sorted_norm_0dotsremoved_annotated_everythingEXCEPTsvdels.vcf > Amex-pg_wave_clean_sorted_norm_0dotsremoved_annotated_everythingEXCEPTsvdels_snpeff.vcf Example_file.txt

  4. Output / Error message: Please include detailed infomration, such as transcript ID, VCF line output, etc. None

Expected behavior The intron_variant count should match INTRON count.

Data Attached.

Additional context I'm just curious what variant calls are excluded in the regional count (INTRON). Would an example be a variant that occurs in multiple populations is only counted once? Or a variant that effects introns of multiple genes is counted individually in the intron_variant count but not in the INTRON count?

For a little context, this vcf includes large structural variants. Thanks for your help!

maggs-x avatar Feb 13 '24 06:02 maggs-x