duphold icon indicating copy to clipboard operation
duphold copied to clipboard

Cutoff for duplications

Open robertzeibich opened this issue 2 years ago • 6 comments

The cutoff for deletions is DHFFC 0.7. What is the recommended DHBFC cutoff for duplications?

robertzeibich avatar Sep 01 '23 19:09 robertzeibich

Hi, duplications are harder, but 1.3 is a reasonable start.

brentp avatar Sep 01 '23 19:09 brentp

The detection of duplications is harder. I’m unsure if I can use DHFFC>1.3 or DHBFC>1.3. After the population genotyping and Duphold, I found that some duplications(0/1,1/1) have DHBFC<1.3, but DHFFC>1.3 in the 30x WGS data, and the Samplot results confirm it to be true. Could you give me some advice?

Qijie0615 avatar Jan 16 '24 12:01 Qijie0615

As you find, it's hard to come up with a good cutoff for duplications. The 1.2 cutoff might work in many cases, but would miss when there is already a large cassette that adds a single copy in a tandem dup. You'll have to experiment with what works.

brentp avatar Jan 17 '24 12:01 brentp

Thanks for the quick reply.

  1. I want to know that 1.2 means DHBFC >1.2.
  2. I'm sorry I can't understand this sentence. “it would miss when there is already a large cassette that adds a single copy in a tandem dup” . "cassette" is ?
  3. I would like to use DHBFC>1.2 to further filter the population genotyping data and reduce the false positive rate. Do you think this is a good idea?

Qijie0615 avatar Jan 17 '24 14:01 Qijie0615

Thanks for the quick reply.

  1. I want to know that 1.2 means DHBFC >1.2.

Yes, you could try this.

  1. I'm sorry I can't understand this sentence. “it would miss when there is already a large cassette that adds a single copy in a tandem dup” . "cassette" is ?

I mean if you have a tandem duplication with 10 copies and then you add another single copy, you only expect a 10% increase in depth.

  1. I would like to use DHBFC>1.2 to further filter the population genotyping data and reduce the false positive rate. Do you think this is a good idea?

It's worth trying, but you'll have to evaluate for yourself how effective it is. If you have trios, you can look at mendelian violations and transmissions. Otherwise, you can look at samplots of variants that are filtered

brentp avatar Jan 17 '24 14:01 brentp

Thank you for your quick reply.

Qijie0615 avatar Jan 18 '24 02:01 Qijie0615