Results 506 comments of Tom White

This has now been fixed in Dask, so should be possible to check if it resolves the issue here.

Hi @abalter - sgkit converts VCF files to Zarr format, which can then be opened as Xarray. So it's not a Pandas dataframe, but it should be possible to convert...

Thanks for posting this @LiangdeLI. I had one quick comment: > 2\. In sgkit GWAS it says 'To run PCA we need to filter out variants with any missing alt...

Looks like mypy needs `int` rather than `Number` now: https://github.com/pydata/xarray/commit/a73628317acd73cb55f03ad036708d493f4a8b54

I ran a few experiments to [simulate preemption](https://cloud.google.com/compute/docs/instances/preemptible#preemption-process) by stopping a worker VM midway through a job. Here is a normal run on a cluster on 16 instances with no...

Note that all of these experiments were done just by stopping the worker abruptly. There is an unmerged Dask issue to [make workers handle shutdown gracefully](https://github.com/dask/distributed/pull/2844). The idea is that...

We already have some of this, but it would be good to improve it and make it more standard. In particular, we don't link to the different versions we have...

I'm not sure exactly, but it looks like a race between the Zarr attributes for a variable being updated (`variant_allele`) the [first](https://github.com/pystatgen/sgkit/blob/master/sgkit/io/vcfzarr_reader.py#L254) and [second](https://github.com/pystatgen/sgkit/blob/master/sgkit/io/vcfzarr_reader.py#L259) time. Perhaps those two blocks should...

I tried to reproduce this again, but no luck. I tried creating standalone test cases that I could run more frequently, but I never saw the error locally on my...