delly icon indicating copy to clipboard operation
delly copied to clipboard

germline filter option and cohort level analysis

Open nitha26 opened this issue 3 years ago • 4 comments

Hi, I'm running Delly version 8.5 in Centos for detecting Germline SVs from novaSeq Short reads. So I have few doubts arisen while using Delly.

  1. While running GermlineSV calling for a single sample, how to use the option germline filter i.e "delly filter -f germline -o germline.bcf merged.bcf". here the merged.bcf is merged file is information of different samples. so in case for single sample how to use this filter. Because in delly it had shown "Apply the germline SV filter which requires at least 20 unrelated samples".

  2. Suppose if I have to do svcalling for 500samples. Initially Delly germline svcalling is done for 200 samples, then later while adding up the samples (for population analysis), whether I have to redo from 2nd step "delly merge -o sites" ?

  3. The another thing is if had completed whole GermlineSVcalling for 200 samples, then later if I have to do analysis for only 100 samples (or some particular samples), then can I use "sites.bcf" generated from 200 samples for -v option? (delly call -g hg19.fa -v sites.bcf -o s1.geno.bcf -x hg19.excl s1.bam).

  4. While doing merging step, do you keep any particular values like mini and max. distance (bp) between breakpoints? and what are the default filtration values used while call germlineSVs.

Thank you.

nitha26 avatar Feb 24 '21 11:02 nitha26

(1) For a single-sample you cannot apply delly filter as written in the README. (2) Adding samples to an already existing genotyped BCF file is indeed tricky. We don't have a standard workflow for this. (3) Subsetting samples is fine. You can always use bcftools view for that and then require --min-ac 1 (4) The defaults should be fine for a standard illumina paired-end library.

tobiasrausch avatar Mar 06 '21 09:03 tobiasrausch

Thanks, Tobias for answering. But still, I need more information.

  1. Yes, I had seen in README. But my question is Why this -f germline filter option is applied only when there are > 20 unrelated samples? so in this case how this filter helps in calling SVs in these samples compared to single sample detection? I.e, for example, if I'm running Delly-germilneSVcalling for NA12878 sample without applying germline filter how this delly detects germline SVs. (I understand delly calls SVs based on the paired-ends, split-reads, and read-depth algorithm, but my doubt is what is the main concept of using -f germline filter when we have >20 unrelated samples. could you explain to me how this filter works?)

  2. Adding samples to an already existing genotyped BCF file is indeed tricky. We don't have a standard workflow for this. I have read Delly is used to detect SVs from 2504 samples of 1kGP Phase3 (and also in some other large-scale studies). In this case, I guess we will not be able to call 2504 samples all at once, there is some other way to do it. For my study, I have to do population analysis, that's why I am trying to figure out how to do this step. Normally while doing a population study for calling SNVs we have an option in GATK as joint-calling. So in SVs could you guide how to population study.

Thank you.

nitha26 avatar Mar 07 '21 06:03 nitha26

(1) The germline filter only helps for specificity not sensitivity. (2) If all the samples are available you can run the standard germline workflow. Please see the README.

tobiasrausch avatar Mar 07 '21 11:03 tobiasrausch

@tobiasrausch, after merging SVsites (2nd step delly merge) and performed genotyping (3rd step) for 320 samples by following command
delly call -g GRCh38.fa -v 320samples_sites.bcf -o "+a[int]+".geno.bcf -x human.hg38.excl.tsv "+a[int]+".cram" (python mpi) and output generated as 320Genotype vcf files.
I had predicted SVs for same 320 samples from other two different tools, so I have to merge each samples of SVs from different tools into single vcf for finding consensus.

So my question is, since I am planning to find to merging and to find consensus across samples

  1. for merging purpose i'm taking "s1.geno.bcf" output of 3rd step of each samples and converting it to vcf file using bcftools view. I am using this s1.geno.vcf for merging purpose. i. e sampl1_lumpy.vcf sampl1_manta.vcf sampl1.dellygeno.vcf to single vcf (same way for other sample also). Am I taking correct file of Delly SVs for analysis (s1.geno.bcf) ?
  2. I am confused, because when I checked the count of each genotype file (s1.geno.bcf, s2.geno.bcf,....) every samples genotype file had same count? why it so, am I doing any steps wrong?

e.g: wc -l s1.geno.vcf is 1435401 and wc -l s2.geno.vcf is also 1435401 ..... for 320samples genotype file also. why it so? I tried using **bcftools view** also.. getting same count

Sorry If my question sounds silly. But your explanation for these question will greatly help me to improve mu understandings.

Thanks.

nitha26 avatar Mar 11 '21 12:03 nitha26