sgkit issues

Results 216 sgkit issues

Sort by recently updated

Codecov reports are not consistently posted on PRs

On some PRs on https://app.codecov.io/gh/pystatgen/sgkit/pulls?page=1&state=open&order=-pullid it says "Missing base report"

tomwhite

Use conda `environment.yml` file for setting up the environment

At the moment we use `requirements.txt` for creation of environment for development and usage. Although its quick to get started with pip, I think its a good idea to start...

aktech

Possible race condition in _concat_zarrs_optimized

``` ================================== FAILURES =================================== _________________ test_vcfzarr_to_zarr[None-True-True-False] __________________ shared_datadir = WindowsPath('C:/Users/runneradmin/AppData/Local/Temp/pytest-of-runneradmin/pytest-0/test_vcfzarr_to_zarr_None_True1/data') tmpdir = local('C:\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-runneradmin\\pytest-0\\test_vcfzarr_to_zarr_None_True1') grouped_by_contig = True, consolidated = True, has_variant_id = False concat_algorithm = None @pytest.mark.parametrize( "grouped_by_contig, consolidated, has_variant_id", [...

tomwhite

Don't gate IO libraries by default

Given the presence of wheels for all 3 of our upstream IO libraries, I think it makes sense to favor convenience now and have `pip install sgkit` pull in the...

hammer

Extract additional metadata from VCF files

- [VCF 4.2 spec](https://samtools.github.io/hts-specs/VCFv4.2.pdf) - Example VCF file: https://storage.googleapis.com/hail-tutorial/1kg.vcf.bgz - [cyvcf2.pyx](https://github.com/brentp/cyvcf2/blob/master/cyvcf2/cyvcf2.pyx) - Header types: 'CONTIG', 'FILTER', 'FORMAT', 'GENERIC', 'INFO' - [vcf_reader.py](https://github.com/pystatgen/sgkit/blob/master/sgkit/io/vcf/vcf_reader.py) ### ##INFO - These fields are (usually?) per variant...

hammer

data representation

Genome region query API

Raising this issue to discuss API for selecting data from a given genome region, which could be either a whole contig or a contiguous region within a contig. Breaking this...

alimanfoo

data representation

Identify lack of scalability in gwas_linear_regression

It appears that this function does not scale well when run on a cluster. Notes from my most recent attempt: - The code I ran is here: https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/blob/4f862e31b8093d25fdaa8da7f841b9be8583cda4/scripts/gwas.py#L268 - This...

eric-czech

performance

Investigate further chunking improvements for better GWAS performance

#454 helped with GWAS performance, but as mentioned in https://github.com/pystatgen/sgkit/issues/390#issuecomment-768411149, there is scope for further improvement since the transfer time is still a significant proportion of the compute time.

tomwhite

performance

Investigate use of preemptible GCP instances for GWAS

In #390 (and processing in general), using [preemptible instances](https://cloud.google.com/compute/docs/instances/preemptible) on GCP would bring a [cost saving of ~5x](https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/issues/32#issuecomment-748934617).

tomwhite

performance

Move to NumPy's ArrayLike and DtypeLike

Introduced in NumPy 1.20.0: https://numpy.org/doc/stable/release/1.20.0-notes.html#numpy-is-now-typed These would replace our types in `sgkit.typing`.

tomwhite

process + tools

sgkit
sgkit copied to clipboard

Metadata

Codecov reports are not consistently posted on PRs

Use conda `environment.yml` file for setting up the environment

Possible race condition in _concat_zarrs_optimized

Don't gate IO libraries by default

Extract additional metadata from VCF files

Genome region query API

Identify lack of scalability in gwas_linear_regression

Investigate further chunking improvements for better GWAS performance

Investigate use of preemptible GCP instances for GWAS

Move to NumPy's ArrayLike and DtypeLike

← Metadata

Owner

Metadata

sgkit sgkit copied to clipboard

Metadata

← Metadata

Owner

Metadata

sgkit
sgkit copied to clipboard