Tom White
Tom White
I started looking at [implementing gene-ε in Python](https://github.com/tomwhite/sgkit/tree/genee/validation/gwas/method/genee), beginning with a simple port of the R code to use Pandas and scikit-learn. I used [GaussianMixture](https://scikit-learn.org/stable/modules/mixture.html#gmm) as the Python equivalent to...
Thanks @eric-czech, great summary! I agree that it's worth trying out an approach using the DataFrame API and I'll try that first. I'd also be curious to see what a...
Thanks @lorinanthony for confirming that! @eric-czech I have implemented your suggestion using Dask dataframes here: https://github.com/tomwhite/sgkit/blob/genee/validation/gwas/method/genee/test_genee_dask.py. Still doesn't include a regression step.
> So let's see how far we can get with sklearn. I created this notebook to investigate: https://nbviewer.org/github/tomwhite/sgkit/blob/genee/validation/gwas/method/genee/genee-ld-simulation.ipynb TLDR: memory is the limiting factor, but we can get quite a...
See https://github.com/brentp/cyvcf2/issues/248
> In general, I think we should try to get a point where we can determine the name of a method to compute a variable from the variable name. One...
I updated this to use the latest code, and stubbed out some of the numba calls: https://github.com/tomwhite/sgkit/tree/pyodide-latest. This simplifies its usage a bit: ``` Welcome to the Pyodide terminal emulator...
Just wanted to note here that the [GA4GH Variation Representation Specification](https://vrs.ga4gh.org/en/latest/index.html) uses [inter-residue coordinates](https://vrs.ga4gh.org/en/latest/appendices/design_decisions.html#inter-residue-coordinates-design).
Currently, window variables (`window_{contig,start,stop}`) are numpy arrays, which as Eric points out in the comment do not scale well to 100M variants. I think there are two things to do:...
Thanks @timothymillar, this would be a good addition in the future. What do you mean by "up to the point of overflowing the index"?