Add mean imputation function
ref #609
Add mean impute function for call_dosage, call_genotype, and call_genotype_probability
Thanks for looking into this @tszfungc! I think this could be a great approach for imputing call_dosage and call_genotype_probability. However, I don't think it will produce the desired result for call_genotype.
The values in call_genotype are (potentially unsorted) alleles whose order along the ploidy dimension doesn't have any particular meaning. So, as far as I can tell, the mean of those alleles can't really be used for anything.
Thanks for the review @tomwhite @timothymillar. I agree that the allele order doesn't have a particular meaning. The order along ploidy should be ignored by computing the mean along dim=['samples', 'ploidy'], But this is also an unusual use to me.
@jeromekelleher the trade-off between returning new variables or replacing existing variables was previously discussed in https://github.com/pystatgen/sgkit/pull/308#issuecomment-705706571. I personally have a slight preference for replacing existing variables but there are some good points raised in that discussion. The primary concern seems to be that replacing existing variables is effectively a mutate operation, which goes against the general pattern of treating arrays as immutable.
I see, thanks. Hmm, not much choice other than to create a bunch of new variables then.
This PR has conflicts, @tszfungc please rebase and push updated version 🙏
This PR has conflicts, @tszfungc please rebase and push updated version 🙏
This PR has conflicts, @tszfungc please rebase and push updated version 🙏
This PR has conflicts, @tszfungc please rebase and push updated version 🙏
This PR has conflicts, @tszfungc please rebase and push updated version 🙏