pertpy
pertpy copied to clipboard
Improve DE tutorial
Description of feature
In addition to the basic DE tutorial that shows the basic functionality of DE methods, we should have an advanced tutorial that shows how to build more complex contrasts, including interaction terms based on a real-world multi-condition dataset/atlas.
Some work has been d started here, but didn't progress far: https://github.com/scverse/multi-condition-comparisions/blob/main/docs/notebooks/breast_cancer.ipynb
Added a rough draft https://github.com/scverse/pertpy-tutorials/blob/main/differential_gene_expression.ipynb
TODOs
- [ ] Maybe make the patients as independent covariates work (leads to matrix not being full rank)
- [ ] Better explain what the DE results mean and how they can be used now (maybe a dirty decoupler/GSEA/something analysis
- [ ] Better explain the models
- [ ] Better explain how design matrices and contrasts work (just 1-2 sentences each)
- [ ] Incorporate better visualizations as they become available in pertpy
- [ ] Show how to use scanpy plotting functions with the DE results
Are the resources linked here still relevant or outdated by now? I don't know the ins and outs of DE in pertpy well enough to tackle this myself but I could help finalizing this by walking through a tutorial and trying to adjust it to a dataset of mine providing feedback on what is missing or how to change the tutorial to make it work for me. Once I get it to work (probably with some help by you guys), I should know the process well enough to contribute to the tutorial draft so we can finalize this.
I'd say that they're still relevant generally. The thing is, we had thought of decoupling the DE part from pertpy into its own package as pertpy is still a bit heavy for just that functionality. Therefore, DE in pertpy hasn't received much love for a while.
You'd need to adapt https://github.com/scverse/pertpy-tutorials/blob/main/differential_gene_expression.ipynb but ideally with a dataset that is already in pertpy.
I noticed that the existing documentation even for the simple cases seems to be outdated: The tutorial uses
pertpy.tools.PseudobulkSpace().compute( adata, target_col="Patient", groups_col="Cluster", layer_key="counts", mode="sum", min_cells=10, min_counts=1000 )
even though the corresponding current API documentation does not mention any min_cells or min_counts parameters.
Sorry, that's on me! I'll take care of this soon. If you remove these two parameters, it should hopefully still work (on main).
I also have plans to ensure that all tutorials always run through via CI.
No worries, just wanted to report it here. Especially since I saw your plans to (re-)enable all notebooks via CI and I noticed that this is even still included in your draft linked above.
Passing the output of pertpy.tools.PseudobulkSpace().compute() to dc.pl.filter_samples() (as shown in the docs) is also not possible anymore:
AssertionError: psbulk_* columns not present in adata.obs, this function should be used after running decoupler.pp.pseudobulk
Ouch, I see, yeah. That's also because I moved from decoupler to scanpy pseudobulk. I'll take care of it soon!
There are also examples still using the unsupported parameters: https://github.com/scverse/pertpy/blob/e6aaeb28537c382a5d714763e6ae0d645eb22ef7/pertpy/tools/_differential_gene_expression/_base.py#L92-L93
dc.swap_layer is now dc.pp.swap_layer and its X_layer_key was renamed to simply X_key.
Also, dc.get_metadata_associations and dc.plot_associations have moved to dc.tl.rankby_obsm and dc.pl.obsm, respectively. And also their API has changed.
The tutorial is fixed in https://github.com/scverse/pertpy/pull/814
It still uses decoupler (although decoupler is no longer a dependency of pertpy). I don't yet have the time to overhaul it completely.
Thanks @mschilli87 for all the helpful pointers!