tsinfer
tsinfer copied to clipboard
No sites used for inference
Dear tskit-dev team!
I always run into this issue when runnig tsinfer:
Number of polymorphic sites: 3906560 No sites used for inference No sites used for inference Inferred tree sequence 'sparrow_ts': 2 trees over 249.23985 Mb
I have used everything like in this tutorial: https://tskit.dev/tsinfer/docs/stable/tutorial.html#reading-a-vcf
I really do not get the problem. Maybe it is related to my vcf somehow?
Great tool anyway, Thank you for the help
Hi @bgyuris! The requirements for a site to be used are listed at https://tskit.dev/tsinfer/docs/stable/inference.html#data-requirements Note that the site must have a known ancestral allele and be biallelic. Are your sites valid?
Thanks for reporting this. Can you provide us with a snippet of your VCF? Have you tried with increased verbosity, which should output logs? I'm sure we can help you fix it.
I'll try to put an FAQ together too, and add this to it.
When we say "No sites used for inference" it would be useful to report the numbers of non-inference sites of different sorts, I think (e.g. V with no defined ancestral allele, W presumed >bi -allelic, X singletons, Y monoallelic, U unphased, Z user-excluded). What do you think @benjeffery ?
Alternatively, we could have a stand-alone tsinfer.check_sites
routine which reports this, and we could recommend to run it in the "no inference sites" message.
Yeah, would be good to have more detail - we'd have to log it while adding sites such that we don't have to scan when the condition is hit.