bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Bcftools roh output

Open purohitshilp opened this issue 2 years ago • 1 comments

Dear Bcftools ROH team,

I'm in the process of detecting ROH for my samples WGS data.

I have a few hundred samples vcf files (from the same population) which I merged into one and calculated AFs by following precise steps mentioned on howto page.

As for the genetic map, I downloaded recombination rates for grch38 assembly from https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/tables/genetic_map_hg38_withX.txt.gz since genetic map for grch38 wasn't available on IMPUTE2 site.

Here is the command that I run:

bcftools roh --AF-file AFs.tab.gz --genetic-map geneticmap_grch38_{CHROM}_split.txt -M 100 -o roh_mysample_output.txt mysample.hard-filtered.vcf.gz

I've got the results from bcftools; however I'm getting exactly one RG per chromosome which baffles me. these ROHs are ranging from 40Mb to 250Mb. The output also ST position-wise state but I intend to detect full ROH, preferably multiple ROHs from each of the chromosome. Am I missing here something? I also tweaked a few parameters but I'm getting more or less same results.

Has this got something to do with genetic map as this genetic map comprises of recombination rates for only 3m snps?

Suggestions are highly appreciated.

Cheers, Shilp

purohitshilp avatar Mar 08 '23 19:03 purohitshilp

This is a very difficult question to answer, it really depends on the specifics of the data - site density, the number of heterozygous and homozygous genotypes, allele frequencies at the called sites, and the genetic map. All these enter the calculation and without seeing the actual data one cannot be sure where is the problem. If you can share a small part in a reproducible test case, I can take a look.

pd3 avatar Mar 20 '23 10:03 pd3