tsinfer icon indicating copy to clipboard operation
tsinfer copied to clipboard

No sites used for inference

Open bgyuris opened this issue 1 year ago • 4 comments

Dear tskit-dev team!

I always run into this issue when runnig tsinfer:

Number of polymorphic sites: 3906560 No sites used for inference No sites used for inference Inferred tree sequence 'sparrow_ts': 2 trees over 249.23985 Mb

I have used everything like in this tutorial: https://tskit.dev/tsinfer/docs/stable/tutorial.html#reading-a-vcf

I really do not get the problem. Maybe it is related to my vcf somehow?

Great tool anyway, Thank you for the help

bgyuris avatar Nov 09 '23 10:11 bgyuris

Hi @bgyuris! The requirements for a site to be used are listed at https://tskit.dev/tsinfer/docs/stable/inference.html#data-requirements Note that the site must have a known ancestral allele and be biallelic. Are your sites valid?

benjeffery avatar Nov 09 '23 12:11 benjeffery

Thanks for reporting this. Can you provide us with a snippet of your VCF? Have you tried with increased verbosity, which should output logs? I'm sure we can help you fix it.

I'll try to put an FAQ together too, and add this to it.

hyanwong avatar Nov 09 '23 12:11 hyanwong

When we say "No sites used for inference" it would be useful to report the numbers of non-inference sites of different sorts, I think (e.g. V with no defined ancestral allele, W presumed >bi -allelic, X singletons, Y monoallelic, U unphased, Z user-excluded). What do you think @benjeffery ?

Alternatively, we could have a stand-alone tsinfer.check_sites routine which reports this, and we could recommend to run it in the "no inference sites" message.

hyanwong avatar Nov 09 '23 13:11 hyanwong

Yeah, would be good to have more detail - we'd have to log it while adding sites such that we don't have to scan when the condition is hit.

benjeffery avatar Nov 10 '23 12:11 benjeffery