sansa
sansa copied to clipboard
The representation method of break point pos
I used delly and sansa to build a structural variation analysis pipeline and attempted to evaluate its accuracy through NCCL validation data.
Delly detected the expected mutation. But I'm a bit surprised why there is a 1 bp difference in the representation of breakpoints.
answer:
the sansa result:
[1]ANNOID | query.chr | query.start | query.chr2 | query.end | query.id | query.qual | query.svtype | query.ct | query.svlen | query.startfeature | query.endfeature |
---|---|---|---|---|---|---|---|---|---|---|---|
None | 7 | 55266405 | 7 | 92462404 | INV00002766 | 7260 | INV | 3to3 | 37195999 | EGFR(0;+) | CDK6(0;-) |
None | 22 | 23632600 | 9 | 133729450 | BND00012515 | 10000 | BND | 5to3 | 0 | BCR(0;+) | ABL1(0;+) |
in delly.bcf
22 23632600 BND00012515 A ]9:133729450]A 10000 PASS PRECISE;SVTYPE=BND;SVMETHOD=EMBL.DELLYv1.1.6;END=23632601;CHR2=9;POS2=133729450;PE=431;MAPQ=60;CT=5to3;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=60;INSLEN=0;HOMLEN=3;SR=56;SRQ=1;CONSENSUS=AGAATAAAACTAATTTTTTCTCCCAATTTTCTCTTCCTTTTTCTTTTTTCTGTTCCCCCCTTTCTCTTCCAGAGTAAGTACTGGTTTGGGGAGGAGGGTTGCAGCGGCCGAGCCAGGGTCTCCACCCAGGAAGGACTCATCGGGCAGGGTGTGGGGAA;CE=1.97658;RDRATIO=1;SOMATIC GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-253.075,0,-455.964:10000:PASS:222980:329282:106302:2:748:431:155:95 0/0:0,-75.2007,-857.843:10000:PASS:221127:323350:102223:2:1302:0:250:0
7 55266405 INV00002766 C <INV> 7260 PASS PRECISE;SVTYPE=INV;SVMETHOD=EMBL.DELLYv1.1.6;END=92462404;PE=101;MAPQ=60;CT=3to3;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=60;INSLEN=0;HOMLEN=1;SR=20;SRQ=1;CONSENSUS=TAATGATGACTAAAGCAAGGGATTGTGATTGTTCATTCATGATCCCACTGCCTTCTTTTCTTGCTTCATCCTCGTGAGCCAGGGAGCTGCGCCCTCGCCATCTGGGGCCTCGCGCGCG;CE=1.97084;RDRATIO=0.997411;SOMATIC GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-193.272,0,-528.756:10000:PASS:197626:345940:220737:2:577:101:173:77 0/0:0,-75.2167,-873.159:10000:PASS:197576:346736:220664:2:768:0:250:0
example the INV00002766,On the IGV image, it can be seen that soft splicing occurs on chromosome 7
between 55266405 and 55266406;
between 92462404 and 92462405
Perhaps it should be said that this is just a small descriptive questions of delly?I am quite concerned about whether there is a conversion between 1base and 0base?I understand that when annotating snv/indel, there is a requirement to annotate according to the 3 'rule of the transcript,Is there a similar requirement in structural variation?
This BND00012515 is particularly confusing for me, as the breakpoint seems to be "chr22 between 23632659-23632661" and "chr9 between 133729450-133729452",Why choose the current breakpoint?