PyDESeq2 icon indicating copy to clipboard operation
PyDESeq2 copied to clipboard

fix : enable vst be used "blindly" and to fit its own dispersion

Open laudmt opened this issue 1 year ago • 2 comments

Reference Issue or PRs

Fix #263

What does your PR implement? Be specific.

Proposed solution :

  • A self.fit_type is defined at class initialisation and will by default set the fit_type for DEA and VST.
  • If needed, a fit type can be passed to deseq2() and vst() in order to launch the two analysis with a separate fit_type. It will set self.fit_type to the user one.
  • self.vst_fit() and self.vst_transform() will call internally the self.fit_type.
  • vst_transform() should always be called after vst_fit() or deseq2(). Raise an exception if needed trend_coef have not been computed by deseq2() or vst_fit()
  • Add unit tests

laudmt avatar Apr 09 '24 12:04 laudmt

Thanks for the PR @laudmt!

I think it does the trick for VST, but I have a little concern regarding the fact that providing a fit_type other than None overwrites a DeseqDataSet's trend_fit_type.

Here's an example of problematic / ambiguous behavior that could occur:

dds = DeseqDataSet(counts=counts, metadata=metadata, trend_fit_type="parametric")
# Start by computing VST
dds.vst(fit_type="mean")
# Then perform DEA
dds.deseq2() # what trend_fit_type is used to compute the curve ?

In the above example, dds.deseq2() would use trend_fit_type = "mean" (because it was overwritten by vst), but the user probably intended to fit VST with a curve and then run DEA with a parametric curve.

I think we should find a way to make the choices of curve fit type for vst() and deseq2() completely independent.

BorisMuzellec avatar Apr 10 '24 08:04 BorisMuzellec

Thanks for the PR @laudmt!

I think it does the trick for VST, but I have a little concern regarding the fact that providing a fit_type other than None overwrites a DeseqDataSet's trend_fit_type.

Here's an example of problematic / ambiguous behavior that could occur:

dds = DeseqDataSet(counts=counts, metadata=metadata, trend_fit_type="parametric")
# Start by computing VST
dds.vst(fit_type="mean")
# Then perform DEA
dds.deseq2() # what trend_fit_type is used to compute the curve ?

In the above example, dds.deseq2() would use trend_fit_type = "mean" (because it was overwritten by vst), but the user probably intended to fit VST with a curve and then run DEA with a parametric curve.

I think we should find a way to make the choices of curve fit type for vst() and deseq2() completely independent.

Indeed thank you, I was missing this use case. I will then store two different fit type one for dea and one for vst and enable self.fit_X methods to take a fit_type in argument.

laudmt avatar Apr 10 '24 09:04 laudmt

Hi @laudmt, I've pushed a commit in which I implemented the solution I suggested earlier, i.e. setting two separate fit_type and vst_fit_type attributes and duplicating fields (with or without a "vst_" prefix) to avoid data leakage.

Could you have a look when you have time and tell me if that would work for you :) ?

BorisMuzellec avatar May 23 '24 14:05 BorisMuzellec

Thank you @BorisMuzellec I can confirm that this fix the use of pydeseq2 with multiple covariate on my side.

laudmt avatar May 30 '24 14:05 laudmt