pixy icon indicating copy to clipboard operation
pixy copied to clipboard

Using pixy without an all sites VCF?

Open milesandersonmn opened this issue 11 months ago • 1 comments

I have a fairly mundane need for pi and fst estimates that are in the ballpark but not necessarily the most accurate possible. We have a huge number of samples that I don't have the time or resources to generate individual gvcfs for. Can I use pixy on a standard VCF without calling all sites? Is there anything I should know when doing this?

If not are there any tools you all might recommend as an alternative?

Thanks!

milesandersonmn avatar Mar 18 '24 11:03 milesandersonmn

Hi Miles,

There isn't a quick way around the missing data issue for pi/dxy, I'm afraid. All tools, including pixy, will give you biased estimates in the absence of an all-sites VCF. Note that FST doesn't have the same issue, and any tool will work for that.

The only alternative to the true 'all-sites' workflow that I am aware of is to use mop (https://github.com/RILAB/mop) on your BAM files, and use those results to ballpark the denominators for the estimates.

Sorry that I can't be of more help.

Kieran

ksamuk avatar Mar 18 '24 14:03 ksamuk