[FEA] Dask-array based statistics on single cell data
Is your feature request related to a problem? Please describe. This may be a tall ask, but it would be great to have GPU-acceleration for single cell modeling. The current standard for highly accurate modeling on large complex human datasets is the MAST program (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5), or simply pseudobuling. Wilcoxons, t-test, and others have significant statistical flaws that undermine the accuracy of their results when applied to biological questions (like disease vs healthy and whatnot).
Even at sub-million cell sizes, MAST was slow. At 1+ million cells, it becomes unbearably slow. Being able to run MAST-like analysis in a Dask array-based AnnData would truly unlock complex statistical analysis of large scale scRNAseq analysis
Describe the solution you'd like Dask-array based statistical modeling of scRNAseq, based on the known principles/variables that have been figured out by the MAST authors.
Dask-array based linear modeling has been implemented here: https://ml.dask.org/modules/generated/dask_ml.linear_model.LinearRegression.html
Is there a CPU based implementation A link to an implementation or paper with the suggested functionality