pertpy icon indicating copy to clipboard operation
pertpy copied to clipboard

Improve DE tutorial

Open grst opened this issue 1 year ago • 12 comments

Description of feature

In addition to the basic DE tutorial that shows the basic functionality of DE methods, we should have an advanced tutorial that shows how to build more complex contrasts, including interaction terms based on a real-world multi-condition dataset/atlas.

Some work has been d started here, but didn't progress far: https://github.com/scverse/multi-condition-comparisions/blob/main/docs/notebooks/breast_cancer.ipynb

grst avatar May 27 '24 07:05 grst

Added a rough draft https://github.com/scverse/pertpy-tutorials/blob/main/differential_gene_expression.ipynb

TODOs

  • [ ] Maybe make the patients as independent covariates work (leads to matrix not being full rank)
  • [ ] Better explain what the DE results mean and how they can be used now (maybe a dirty decoupler/GSEA/something analysis
  • [ ] Better explain the models
  • [ ] Better explain how design matrices and contrasts work (just 1-2 sentences each)
  • [ ] Incorporate better visualizations as they become available in pertpy
  • [ ] Show how to use scanpy plotting functions with the DE results

Zethson avatar Jun 12 '24 14:06 Zethson

Are the resources linked here still relevant or outdated by now? I don't know the ins and outs of DE in pertpy well enough to tackle this myself but I could help finalizing this by walking through a tutorial and trying to adjust it to a dataset of mine providing feedback on what is missing or how to change the tutorial to make it work for me. Once I get it to work (probably with some help by you guys), I should know the process well enough to contribute to the tutorial draft so we can finalize this.

mschilli87 avatar Jun 16 '25 07:06 mschilli87

I'd say that they're still relevant generally. The thing is, we had thought of decoupling the DE part from pertpy into its own package as pertpy is still a bit heavy for just that functionality. Therefore, DE in pertpy hasn't received much love for a while.

You'd need to adapt https://github.com/scverse/pertpy-tutorials/blob/main/differential_gene_expression.ipynb but ideally with a dataset that is already in pertpy.

Zethson avatar Jun 16 '25 19:06 Zethson

I noticed that the existing documentation even for the simple cases seems to be outdated: The tutorial uses

pertpy.tools.PseudobulkSpace().compute(
    adata, target_col="Patient", groups_col="Cluster",
    layer_key="counts", mode="sum",
    min_cells=10, min_counts=1000
)

even though the corresponding current API documentation does not mention any min_cells or min_counts parameters.

mschilli87 avatar Jul 01 '25 09:07 mschilli87

Sorry, that's on me! I'll take care of this soon. If you remove these two parameters, it should hopefully still work (on main).

I also have plans to ensure that all tutorials always run through via CI.

Zethson avatar Jul 01 '25 09:07 Zethson

No worries, just wanted to report it here. Especially since I saw your plans to (re-)enable all notebooks via CI and I noticed that this is even still included in your draft linked above.

mschilli87 avatar Jul 01 '25 09:07 mschilli87

Passing the output of pertpy.tools.PseudobulkSpace().compute() to dc.pl.filter_samples() (as shown in the docs) is also not possible anymore:

AssertionError: psbulk_* columns not present in adata.obs, this function should be used after running decoupler.pp.pseudobulk

mschilli87 avatar Jul 01 '25 11:07 mschilli87

Ouch, I see, yeah. That's also because I moved from decoupler to scanpy pseudobulk. I'll take care of it soon!

Zethson avatar Jul 01 '25 11:07 Zethson

There are also examples still using the unsupported parameters: https://github.com/scverse/pertpy/blob/e6aaeb28537c382a5d714763e6ae0d645eb22ef7/pertpy/tools/_differential_gene_expression/_base.py#L92-L93

mschilli87 avatar Jul 01 '25 12:07 mschilli87

dc.swap_layer is now dc.pp.swap_layer and its X_layer_key was renamed to simply X_key.

mschilli87 avatar Jul 02 '25 13:07 mschilli87

Also, dc.get_metadata_associations and dc.plot_associations have moved to dc.tl.rankby_obsm and dc.pl.obsm, respectively. And also their API has changed.

mschilli87 avatar Jul 02 '25 15:07 mschilli87

The tutorial is fixed in https://github.com/scverse/pertpy/pull/814

It still uses decoupler (although decoupler is no longer a dependency of pertpy). I don't yet have the time to overhaul it completely.

Thanks @mschilli87 for all the helpful pointers!

Zethson avatar Jul 05 '25 09:07 Zethson