batchglm icon indicating copy to clipboard operation
batchglm copied to clipboard

[WIP] Mp/dispersion smoothing

Open picciama opened this issue 2 years ago • 3 comments

This branch contains the dispersion-smoothing functionality:

  • [x] write wrapper around the training procedure in the training_procedure function.
  • [x] check for missing optional import of scikit-fda
  • [x] implement sctransform-like scale param dispersion smoothing procedure
  • [ ] implement final mean model refit after dispersion smoothing procedure
  • [ ] write unit test for dispersion smoothing using sctransform with test data -> test for deviation from true scale param
  • [ ] check dask array support in sctransform code
  • [ ] check if exponentiation of scale param is always correct

Optional TODOs:

  • [ ] implement DESeq2 approach (doesn't smooth outliers, maybe not applicable here) + unit test
  • [ ] implement edgeR approach (will be moved to edgePy package eventually)

picciama avatar May 27 '22 14:05 picciama

Hi,

"implement edgeR approach" this is actually a very important use-case for batchGLM. We are planning to use batchGLM/diffxpy for a pure Python implementation of MILO and we promised to be able to 1:1 replace

    dge = edgeR.DGEList(counts=count_mat[keep_nhoods,:][:,keep_smp], lib_size=lib_size[keep_smp])
    dge = edgeR.calcNormFactors(dge, method="TMM")
    dge = edgeR.estimateDisp(dge, model)
    fit = edgeR.glmQLFit(dge, model, robust=True)

eventually. I would kindly ask you to also strongly consider implementing this. Having the edgeR and DEseq2 approaches being implemented here will also greatly boost the impact. I have no doubt about this.

Zethson avatar Jun 15 '22 09:06 Zethson

I would kindly ask you to also strongly consider implementing this. Having the edgeR and DEseq2 approaches being implemented here will also greatly boost the impact. I have no doubt about this.

Most definitely. I had a look at the edgeR source code for already. It shouldn't be too complicated to transfer this to batchGLM. I will start implementing this tomorrow but cannot give an estimate for the time it'll take at this point in time.

I think the main part would be to take over estimateDisp, i.e. the glm edgeR procedure replaced by batchGLM using trend.method="locfit". I will do this first and see which of the arguments the function accepts are needed for this configuration. Once it's implemented I'll see what else needs to be transferred to python. Let me know if you have any specific dataset in mind that's well suited for testing the batchGLM procedure. I'll create a jupyter notebook in the batchglm_tutorials repo and we could maybe do some fancy rpy2 stuff to directly compare against edgeR.

picciama avatar Jun 20 '22 14:06 picciama

Amazing @picciama

Let me know if you have any specific dataset in mind that's well suited for testing the batchGLM procedure. I'll create a jupyter notebook in the batchglm_tutorials repo and we could maybe do some fancy rpy2 stuff to directly compare against edgeR.

Would it be too crazy to just compare for example the DE results of the edgeR reimplementation and the new Python version for a small dataset/simulation? I know that the edgeR model does a lot, but this might be the eventual goal?

Thank you!

Zethson avatar Jun 20 '22 15:06 Zethson