Tom White comments

Results 506 comments of


                                            Tom White

Executor for Apache Spark

A Spark executor would be a great addition. I just added some notes about implementing a new executor in #498 if you're interested in having a go at this @rbavery?

Executor for Apache Spark

@songhan89 you could do this by transforming the Cubed DAG (a NetworkX MultiDiGraph of pipeline objects) into a Spark DAG of RDD objects, then computing the DAG of RDDs in...

Executor for Apache Spark

@songhan89 Great progress! It looks like the failures are because some of the Cubed unit tests use a small `allowed_mem` setting (100000 - i.e. 100kB), and Spark doesn't allow values...

Improve tensordot performance with auto-rechunking

Thanks for working on this @GenevieveBuckley. > I need some better examples that more thoroughly cover the space of possible input & output shapes. The example I mentioned in https://github.com/dask/dask/issues/7847#issuecomment-874749110...

Deprecate VCF write functions

We could also add a link to `vcftools view` to the documentation for [`display_genotypes`](https://sgkit-dev.github.io/sgkit/latest/generated/sgkit.display_genotypes.html#sgkit.display_genotypes) as a suggested alternative.

Deprecate VCF write functions

Superceded by #1264

Dask 2024.8.1 and later is very slow

I've opened https://github.com/dask/dask/issues/11416

Dask 2024.8.1 and later is very slow

Unfortunately, it looks like Dask 2024.10.0 doesn't fix this, see https://github.com/sgkit-dev/sgkit/actions/runs/11551276595 which is taking 19 minutes to run, rather than 6 (with Dask 2024.08.0).

Dask 2024.8.1 and later is very slow

On further investigation what's happening is that locally defined functions that are passed to Dask `map_blocks` and that wrap Numba functions are being recompiled every time the (genomics) method is...

Dask 2024.8.1 and later is very slow

I've fixed the non-distance functions in this commit: https://github.com/sgkit-dev/sgkit/pull/1261/commits/e83b52cdf1ef1b305eefdd8bcaca55b437cc4e4b I'm not sure what to do about the distance functions at this point.