sgkit icon indicating copy to clipboard operation
sgkit copied to clipboard

tskit example of sgkit Zarr for intermediate data

Open hammer opened this issue 2 years ago • 2 comments

To be assigned to @benjeffery once he's a member of our org!

hammer avatar Nov 21 '22 16:11 hammer

https://github.com/pystatgen/sgkit/issues/347 may be related

hammer avatar Nov 21 '22 17:11 hammer

The point we're illustrating here is the power of open and extensible formats. Previously we had to convert VCFs to our own zarr formats which was time-consuming and tedious. Now we can just add a few extra fields and bits of metadata to the sgkit dataset, allowing the user to do QC directly and avoiding the need for several copies of the data (beyond pulling data out of VCF, but we'll have made the point about columnar binary storage well by this point I'd imagine).

jeromekelleher avatar Nov 23 '22 09:11 jeromekelleher