sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Scalable genetics toolkit

Results 216 sgkit issues
Sort by recently updated
recently updated
newest added

Currently we use fixed-length strings for storing alleles, but this is inefficient since the length is the size of the longest allele in the whole dataset. For example, in some...

data representation

### Discussed in https://github.com/pystatgen/sgkit/discussions/652 Originally posted by **patrick-koenig** August 19, 2021 Hi sgkit team, is it planned to port the GFF3 I/O utility methods (especially the gff3_to_dataframe() method) of scikit-allel...

IO

Some discussion in https://github.com/pystatgen/sgkit/pull/647#discussion_r686015457 As pointed out by @jeromekelleher it's interesting to look at the main [init](https://github.com/pystatgen/sgkit/blob/5d58d09f22f023c866e6b252662f84f83302eb61/sgkit/__init__.py), to see current names. Some current options: * `count_` prefix * `infer_` prefix...

documentation
process + tools

Tracking issue for https://github.com/brentp/cyvcf2/issues/216

IO

Anyone object to us adding [tskit](https://tskit.readthedocs.io/en/latest/) as an import format, so we have an ``sgkit-tskit`` repo? I'm happy to do the coding here, and I think it'll be a useful...

enhancement
question
IO

This would include the following: - [ ] Use the same code for contig interpretation - [ ] Make the docs consistent - [ ] Allow the bgen file to...

IO

Alleles are a challenge to represent efficiently in fixed-length arrays. There are a couple of problems: 1. the number of alleles is not known until the whole VCF file has...

data representation

On `invoked build`, an error related to numpy typing is currently thrown when running the Glow WGR functions: ``` 2021-07-07 18:39:42,401|INFO|__main__.run:206| -------------------------------------------------- 2021-07-07 18:39:42,401|INFO|__main__.run:207| Covariate info: 2021-07-07 18:39:42,403|INFO|__main__.run:208| Index: 50...

bug

The README in validation says the datasets used in test_regenie were generated via Hail and stored as PLINK files. We need these original PLINK files to run Glow and validate...

question

The CSV files in sgkit/tests/test_regenie/result could benefit from file descriptions as some of the column names are not clear. @EpiSlim

question