Sniffles icon indicating copy to clipboard operation
Sniffles copied to clipboard

Large DEL filtered by Sniffles

Open ziphra opened this issue 2 years ago • 5 comments

Hello,

I am using Sniffles Version 2.0.7.

We used nanopore sequencing with 18X median coverage to recover an 8Mb deletion.

Variant calling with Sniffles did not call this deletion, with default parameters and with --long-del-coverage 1

But running Sniffles with the --no-qc option as suggested in https://github.com/fritzsedlazeck/Sniffles/issues/366 showed that this deletion was filtered because of COV_CHANGE:

chr2	220393162	Sniffles2.DEL.59CAS1	N	<DEL>	60	COV_CHANGE	PRECISE;SVTYPE=DEL;SVLEN=-8514152;END=228907314;SUPPORT=6;COVERAGE=14,8,16,10,15;STRAND=+-;AF=0.545;STDEV_LEN=0.000;STDEV_POS=0.000	GT:GQ:DR:DV	0/1:33:5:6

Indeed, as I understand the COV_CHANGE filter, (14+15)/2*1 is still bigger than coverage near the center for this deletion (I assumed the svcall.coverage_center used for COV_CHANGE filtering is coverage near the center - so 16 here).

In the future, I guess we could set a very high value for --long-del-coverage to not miss this kind of deletion.

However, I feel like large coverage variation could be expected for such large deletion. Maybe, using the mean coverage for large deletion would be more appropriate ? Here, the mean coverage would be 11, so just under the threshold in our case with --long-del-coverage=1 .

Also, STDEV_LEN and STDEV_POS =0, which could be considered for variant filtering.

Thank you,

ziphra avatar Nov 15 '22 17:11 ziphra

Hi, we are facing a similar problem with sniffles V 2.0.7. also using ONT Data. The deletion

chr14 88391507 Sniffles2.DEL.181SE gttgcat...caatttagttcttt N 60 COV_MIN PRECISE;SVTYPE=DEL;SVLEN=-31666;END=88423173;SUPPORT=15;COVERAGE=13,0,0,0,14;STRAND=+-;STDEV_LEN=0.000;STDEV_POS=0.000 GT:GQ:DR:DV ./.:0:0:0

was filtered by Sniffles with COV_MIN filter (running with --no-qc).

I then tried to use --minsupport 1 and also --long-del-coverage 1 to see if there is any chance to have this variant in the resulting vcf. But it is always filtered out. Which parameter do we have to adjust, so the variant is poping up in the output?

Bests Stefan

stefandiederich avatar Nov 29 '22 13:11 stefandiederich

I had a similar issue...for a 160kb DEL validated orthogonally (Illumina, PCR) and clearly visible in the reads...but I do see that the coverage fluctuates near the centre of the deletion. Trying with --no-qc to see if the variant is removed, but would also +1 to Stefan's request for other parameters to alter for this specific problem.

Thanks, Phil

Phillip-a-richmond avatar Jan 05 '23 19:01 Phillip-a-richmond

Turns out for my DEL this is fully missed by Sniffles, but picked up by CuteSV. Even after adding the --no-qc as suggested above:

Code:

sniffles --input $Proband_BAM \
	--vcf ${Proband_ID}_noQC.vcf.gz \
	--reference $Fasta_Dir/$Fasta_File \
	--no-qc \
	--snf ${Proband_ID}_noQC.snf

The Deletion in question: DH0808_CEP170

(I know that in this snapshot I didn't expand feature visibility for the default sniffles but it's not there either). It's also not shown in the --no-qc file at all, showing this in the output VCF showing these variants upstream and downstream of our de novo deletion of interest:

chr1	243109028	Sniffles2.INS.2713S0	N	AAAATGCCTTCTTTTGCCTATTTTATTAAGGATGTAATAACCCTAATGGCCTTTCATGAAGAGCATTCTCTCCAAATGCATTGCACTGGGACACTCCCGAGGGTCCTGGGCCAACACACACTTATAACATAAAATGTAAAAGGGG	60	SUPPORT_MIN	PRECISE;SVTYPE=INS;SVLEN=145;END=243109028;SUPPORT=1;COVERAGE=22,22,22,22,22;STRAND=-;AF=0.045;STDEV_LEN=0;STDEV_POS=0;SUPPORT_LONG=0	GT:GQ:DR:DV	0/0:48:21:1
chr1	243128302	Sniffles2.INS.2714S0	N	TGCAGGGAAAGCAATAACAAAAATTAGCCTACTTTTAGCTAAATGTTATCACTTTACAAGCAATGAATTTCACTCTCACTTTATTTGGAACACTTAATATTATCAT	60	SUPPORT_MIN	PRECISE;SVTYPE=INS;SVLEN=106;END=243128302;SUPPORT=1;COVERAGE=8,8,8,8,8;STRAND=-;AF=0.125;STDEV_LEN=0;STDEV_POS=0;SUPPORT_LONG=0	GT:GQ:DR:DV	0/0:9:7:1

From CuteSV (cutting out ref+alt cols because CuteSV puts the entire 164kb seq in the ref column...)

chr1 243119556 cuteSV.DEL.1178 CCAG... C . PASS PRECISE;SVTYPE=DEL;SVLEN=-164482;END=243284038;CIPOS=-0,0;CILEN=-0,0;RE=6;RNAMES=NULL;STRAND=+- GT:DR:DV:PL:GQ ./.:.:6:.,.,.:.

@fritzsedlazeck if there is a test version of Sniffles you'd like me to try to get this variant to be detected let me know.

Thanks, Phil

Phillip-a-richmond avatar Jan 09 '23 21:01 Phillip-a-richmond

Dear all, thanks for the clear reports!

We have identified some things over the past weeks and will soon make a new release. In addition, we have identified the parameters that likely are causing these issues and will further optimize them. For this it would be fantastic if you could share some of these regions (bam file) with me: [email protected]. I know data sharing is often tricky, but I hope to obtain these regions (+/-2kbp) to make sure that won't happen anymore in the future!

@smolkmo is also further including other debug options (e.g. read tracing) so we can easier see why Sniffles is ignoring certain reads/regions easier.

Thank you all Fritz

fritzsedlazeck avatar Jan 10 '23 02:01 fritzsedlazeck

Hello, This is my structural variation format. Why is each read-long reference genome sequence N base? Why is the FILTER column full of PASS? I also want to know where to check the filter commands and conditions of the sniffles software.

##fileformat=VCFv4.2 ##source=Sniffles2_2.0.7 ##command="/data/dongjie/anaconda3/envs/dj/bin/sniffles -i /data/dongjie/ONT/output/JD17-HN35mapped.sorted-4.bam -v /data/dongjie/ONT/output/JD17-HN35variants-4.vcf" ##fileDate="2022/12/08 04:45:22" ##contig=<ID=Chr01,length=59293188> ##contig=<ID=Chr02,length=52595666> ##contig=<ID=Chr03,length=47832972> ##contig=<ID=Chr04,length=52830390> ##contig=<ID=Chr05,length=43637999> ##contig=<ID=Chr06,length=52199335> ##contig=<ID=Chr07,length=47014944> ##contig=<ID=Chr08,length=49700108> ##contig=<ID=Chr09,length=50106997> ##contig=<ID=Chr10,length=54367898> ##contig=<ID=Chr11,length=41062601> ##contig=<ID=Chr12,length=43377035> ##contig=<ID=Chr13,length=46482933> ##contig=<ID=Chr14,length=52685030> ##contig=<ID=Chr15,length=53594103> ##contig=<ID=Chr16,length=39861124> ##contig=<ID=Chr17,length=42923138> ##contig=<ID=Chr18,length=60455165> ##contig=<ID=Chr19,length=52277988> ##contig=<ID=Chr20,length=50612565> ##contig=<ID=Chr21,length=28839> ##contig=<ID=Chr22,length=54539> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE Chr01 4372 Sniffles2.DEL.1833S0 N <DEL> 58 PASS PRECISE;SVTYPE=DEL;SVLEN=-47;END=4419;SUPPORT=3;COVERAGE=4,4,4,4,4;STRAND=-;AF=0.750;STDEV_LEN=0.577;STDEV_POS=0.577 GT:GQ:DR:DV 0/1:1:1:3 Chr01 7271 Sniffles2.DEL.1838S0 N <DEL> 58 PASS PRECISE;SVTYPE=DEL;SVLEN=-428;END=7699;SUPPORT=3;COVERAGE=4,4,4,4,7;STRAND=-;AF=0.750;STDEV_LEN=2.082;STDEV_POS=12.662 GT:GQ:DR:DV 0/1:1:1:3 Chr01 9454 Sniffles2.DEL.183BS0 N <DEL> 58 PASS PRECISE;SVTYPE=DEL;SVLEN=-41;END=9495;SUPPORT=4;COVERAGE=7,6,6,6,4;STRAND=+-;AF=0.667;STDEV_LEN=0.000;STDEV_POS=0.000 GT:GQ:DR:DV 0/1:8:2:4 Chr01 84948 Sniffles2.DUP.7C85S0 N <DUP> 60 PASS PRECISE;SVTYPE=DUP;SVLEN=31856;END=116804;SUPPORT=13;COVERAGE=40,40,80,25,25;STRAND=+-;AF=0.310;STDEV_LEN=3.240;STDEV_POS=0.000 GT:GQ:DR:DV 0/1:49:29:13 Chr01 108653 Sniffles2.DEL.185ES0 N <DEL> 60 PASS PRECISE;SVTYPE=DEL;SVLEN=-6003;END=114656;SUPPORT=10;COVERAGE=62,27,29,25,38;STRAND=+-;AF=0.370;STDEV_LEN=2.340;STDEV_POS=2.340 GT:GQ:DR:DV 0/1:52:17:10 Chr01 738346 Sniffles2.INS.2BS0 N ATATATATATATATATATATATATATATATATATAT 60 PASS PRECISE;SVTYPE=INS;SVLEN=36;END=738346;SUPPORT=11;COVERAGE=22,22,20,21,21;STRAND=+-;AF=0.550;STDEV_LEN=2.498;STDEV_POS=20.410;SUPPORT_LONG=0 GT:GQ:DR:DV 0/1:59:9:11

Best wishes to you. Sincerely  yours, Dong Yajun

yajun1314 avatar Jan 13 '23 01:01 yajun1314