Jeff Hammerbacher

Results 69 comments of Jeff Hammerbacher

To get started (will move to PR): Download data from https://datadryad.org/stash/dataset/doi:10.5061/dryad.266k4. [cg-hail-eda.ipynb](https://github.com/related-sciences/gwas-analysis/blob/master/notebooks/organism/canine/cg-hail-eda.ipynb)... ```{python} cc = read_plink(path='cornell_canine/cornell_canine') cc.to_zarr('cc.zarr') ccz = sg.load_dataset('cc.zarr') ccz['variant_ref'] = ccz.variant_allele[:,0] ccz['variant_alt'] = ccz.variant_allele[:,1] df_variant = pd.DataFrame({k.split('_',1)[1]: v...

Okay I have a branch where I'll build up this notebook: https://github.com/hammer/sgkit/blob/canine/docs/examples/canine.ipynb. The download is 90 MB, which is not so big, but it's not so small either. Is there...

I personally find augmented assignment statements less readable. If others disagree that’s fine.

Hey @LiangdeLI are you able to share `output.zarr`?

@LiangdeLI http://xarray.pydata.org/en/stable/generated/xarray.DataArray.values.html is going to convert a large Dask array into a single NumPy array. It might be helpful to work through the Xarray tutorial ([video](https://youtu.be/mecN-Ph_-78), [code](https://github.com/xarray-contrib/xarray-tutorial)) and the Dask...

> I'm +1 for a core API based on Xarray and higher-level functions/classes that hide it. Ah, my first opportunity to quote [Design principles for a new GWAS Toolkit](https://discourse.smadstatgen.org/t/design-principles-for-a-new-gwas-toolkit/28): >...

cc @eric-czech when you get back it would be good to ensure this default doesn’t violate your vision for the ergonomics of sgkit.

>it's currently quite awkward to get contig names (rather than indexes) when looking at summaries of the data I see 3 separate issues here 1. Read in contig metadata from...

Related to this topic I've been thinking about how we might use the Pandas support for categorical data (e.g. [pandas.Categorical](https://pandas.pydata.org/docs/reference/api/pandas.Categorical.html), [pandas.factorize](https://pandas.pydata.org/docs/reference/api/pandas.factorize.html)) to represent contigs and alleles. It's unfortunate that this...