cuteSV icon indicating copy to clipboard operation
cuteSV copied to clipboard

repeat at the SV breakpoint

Open charliechen912ilovbash opened this issue 1 year ago • 4 comments

Hi, I'm wondering if there exist repeat sequence (e.g. simple repeat) on the SV (e.g. deletion) breakpoint, will it affect the accuracy of SV position? or how does CuteSV v1.0.12 overcome this issue.

charliechen912ilovbash avatar Jul 03 '23 17:07 charliechen912ilovbash

Hello @charliechen912ilovbash,

Sorry for replying so late. It is well known that the repeat sequence would disturb the alignment and report low-accurate breakpoints on the read. SV callers collect the breakpoints on each read to infer SV candidates. There is no doubt that treating the low-accurate breakpoints as SV signatures would produce low-quality SV positions. To overcome this, cuteSV clusters all breakpoint signatures in a relatively small region to generate "consensus" SV breakpoint groups, then divides them into possible SV events through length signatures. After that, report final SV calls and corresponding genotypes. For more details please read our paper here. I hope this is helpful to you.

Best regards, Tao

tjiangHIT avatar Sep 01 '23 00:09 tjiangHIT

Hi, Tao

But for the assembly-based SVs calling, did cuteSV still cluster breakpoints? Since it is only one read in the sam, could it be possible for cuteSV to report these breakpoints?

baozg avatar Sep 05 '23 15:09 baozg

Hello @baozg,

Thanks for pointing this out. Actually, cuteSV achieves assembly-based SV calling by converting the typical SV callsets to diploid-based SV callsets. That is, cuteSV generated the initial SV callsets first, which applied the cluster approach mentioned above (there is still more than one SV signature somewhere even though only one contig for a haplotype). Then cuteSV resolves the haplotype tags for each SV call to give phasing-genotype.

Tao

tjiangHIT avatar Sep 06 '23 00:09 tjiangHIT

Hi, Tao

But for an inbreeding plant or haploid cell lines in humans, like A.thaliana or CHM13. It only have one haplotype, did this also need a clustering step.

Besides, as you mentioned, if I want to call variations with cuteSV with population-level assemblies, it would be better to use all the assemblies in one alignment file for this clustering step to refine the breakpoints, right?

Zhigui

baozg avatar Sep 06 '23 11:09 baozg