sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

Scalable genetics toolkit

Results 243 sgkit issues
Sort by recently updated
recently updated
newest added

Through developing the alternative implementation of vcf-to-zarr conversion in #1185 I think there's some bugs in how we're currently handling missing data. Opening this PR for discussion purposes. There's some...

The ``all_fields.vcf`` file contains lots of examples where we explicitly state that an INFO key is missing, rather than omitting the key, e.g. `` II1=. `` and ``II2=.,.`` here. This...

For example, in the test ``test_vcf_scikit_allel.py::test_all_fields``, the ``variant_AA`` field is reported as ``['' '' '' '' 'T' 'T' 'G' '' ''] `` instead of ``['.' '.' '.' '.' 'T' 'T'...

bug

After correcting the missing-fill-bug (#1192) in #1190, the vcf_writer round trip tests fail. We get: ``` $ python3 -m pytest -vs sgkit/tests/io/vcf/test_vcf_roundtrip.py::test_vcf_to_zarr_to_vcf__real_files[sample.vcf.gz-None-True] ``` ``` ), f"INFO keys not equal for...

bug

While looking at the Getting Started guide, I found the following links are broken: - https://tutorial.dask.org/03_array.html#dask.array-contains-these-algorithms - https://tutorial.dask.org/01x_lazy.html In this PR, I have replaced them with correct links: - https://tutorial.dask.org/02_array.html#Blocked-Algorithms-in-a-nutshell...

auto-merge

When you look at a dataset derived from VCF in a notebook, you get this: ![Screenshot from 2023-12-14 13-00-11](https://github.com/pystatgen/sgkit/assets/2664569/7fca9c18-326a-435e-a903-7a7c8512ccdf) ![Screenshot from 2023-12-14 12-59-49](https://github.com/pystatgen/sgkit/assets/2664569/9f7932b8-163f-4fb4-8d9f-87b61f249917) The attributes are automatically "open",and this means...

enhancement

This arose in the context of https://github.com/pystatgen/sgkit-publication/issues/35#issuecomment-1840492652 where dask workers being rotated due to slow memory leaks caused work to be redone and the VCF parse to never complete. This...

When you click on the "show source" link for a function (e.g. [here](https://pystatgen.github.io/sgkit/latest/generated/sgkit.count_variant_genotypes.html#sgkit-count-variant-genotypes)) it shows the rst source not the actual python. I guess a first step would be to...

documentation

Related to #1035. It looks like the code for `write_vcf` wasn't updated to use the new array variables [(line)](https://github.com/pystatgen/sgkit/blob/main/sgkit/io/vcf/vcf_writer.py#L415). I also can't see "filter_id" in variables.py.

bug

@tnguyengel has hit the following error while running `vcf_to_zarr` with the default arguments: ``` File "/home/tnguyen/conda/sgkit_main/lib/python3.10/site-packages/zarr/core.py", line 2168, in _process_for_setitem chunk = value.astype(self._dtype, order=self._order, copy=False) ValueError: could not convert string...

bug
upstream