sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Scalable genetics toolkit

Results 216 sgkit issues
Sort by recently updated
recently updated
newest added

We should collect weird VCFs and ensure we parse them correctly. It would be nice if this zoo had metadata about what's interesting about a file. @jeromekelleher can you point...

IO

IO functions often take Dask chunk specifications and my type hints for those have been inconsistent. We should add a `ChunkType` or something like it so it is more clear...

For association testing and PCA (at least), it may be useful to have a function that imputes dosages/allele counts. With floating point values (i.e. from bgen), this can be very...

good first issue
help wanted
core operations

Because our library may define many precursor variables for any one calculation, it will become crucial for users to be able to persist/cache some of those variables. This means that...

documentation

Currently a lower ploidy sample appears to have missing alleles i.e. -2 is treated as -1. The the calls `[[0, 0, 1, 1], [0, 1, -2, -2], [0, 0, 1,...

The `regenie` function currently calls the number of phenotypes dimension "outcomes" while `gwas_linear_regression` uses the name "traits". I think "traits" is more aligned with our naming style since it has...

The current regenie implementation produces LOCO predictions with the shape: `(contigs, samples, outcomes)`. It isn't immediately obvious in the documentation how to use this then since applying `gwas_linear_regression` to the...

See https://github.com/pystatgen/sgkit/pull/303#discussion_r507906940

Motivated by @hyanwong at https://github.com/pystatgen/sgkit/discussions/580 Some popgen methods need to know which allele at each site is the ancestral allele (cf. https://biology.stackexchange.com/questions/19159/ancestral-allele-explanation). We should augment our data model to optionally...