fix : enable vst be used "blindly" and to fit its own dispersion
Reference Issue or PRs
Fix #263
What does your PR implement? Be specific.
Proposed solution :
- A
self.fit_typeis defined at class initialisation and will by default set the fit_type for DEA and VST. - If needed, a fit type can be passed to deseq2() and vst() in order to launch the two analysis with a separate fit_type. It will set self.fit_type to the user one.
- self.vst_fit() and self.vst_transform() will call internally the self.fit_type.
- vst_transform() should always be called after vst_fit() or deseq2(). Raise an exception if needed trend_coef have not been computed by deseq2() or vst_fit()
- Add unit tests
Thanks for the PR @laudmt!
I think it does the trick for VST, but I have a little concern regarding the fact that providing a fit_type other than None overwrites a DeseqDataSet's trend_fit_type.
Here's an example of problematic / ambiguous behavior that could occur:
dds = DeseqDataSet(counts=counts, metadata=metadata, trend_fit_type="parametric")
# Start by computing VST
dds.vst(fit_type="mean")
# Then perform DEA
dds.deseq2() # what trend_fit_type is used to compute the curve ?
In the above example, dds.deseq2() would use trend_fit_type = "mean" (because it was overwritten by vst), but the user probably intended to fit VST with a curve and then run DEA with a parametric curve.
I think we should find a way to make the choices of curve fit type for vst() and deseq2() completely independent.
Thanks for the PR @laudmt!
I think it does the trick for VST, but I have a little concern regarding the fact that providing a
fit_typeother thanNoneoverwrites aDeseqDataSet'strend_fit_type.Here's an example of problematic / ambiguous behavior that could occur:
dds = DeseqDataSet(counts=counts, metadata=metadata, trend_fit_type="parametric") # Start by computing VST dds.vst(fit_type="mean") # Then perform DEA dds.deseq2() # what trend_fit_type is used to compute the curve ?In the above example,
dds.deseq2()would usetrend_fit_type = "mean"(because it was overwritten byvst), but the user probably intended to fit VST with a curve and then run DEA with a parametric curve.I think we should find a way to make the choices of curve fit type for
vst()anddeseq2()completely independent.
Indeed thank you, I was missing this use case. I will then store two different fit type one for dea and one for vst and enable self.fit_X methods to take a fit_type in argument.
Hi @laudmt, I've pushed a commit in which I implemented the solution I suggested earlier, i.e. setting two separate fit_type and vst_fit_type attributes and duplicating fields (with or without a "vst_" prefix) to avoid data leakage.
Could you have a look when you have time and tell me if that would work for you :) ?
Thank you @BorisMuzellec I can confirm that this fix the use of pydeseq2 with multiple covariate on my side.