pertpy icon indicating copy to clipboard operation
pertpy copied to clipboard

Add permutation test

Open maltekuehl opened this issue 8 months ago • 10 comments

PR Checklist

  • [ ] Referenced issue is linked
  • [x] If you've fixed a bug or added code that should be tested, add tests!
  • [x] Documentation in docs is updated

Description of changes

Adds a PermutationTest to the tools submodule, similar to TTest and WilcoxonTest.

Usage:

result = PermutationTest.compare_groups(
    pdata,
    column="group",
    baseline="A",
    groups_to_compare=["B"],
    test=pertpy.tools.WilcoxonTest, # optional, defaults to WilcoxonTest
    n_permutations=100, # optional, defaults to 100
    seed=42, # optional, defaults to 0
)

Technical details

  • Currently no specific reference p-values for other permutation tests to compare to exist and the standard Wilcoxon values from R deviate from the results of the PermutationTest. However, there is full agreement regarding which genes are significant and I have adapted the test to check for this.
  • Needed to reimplement compare_groups to have the number of permutations and the test to use after permutation as explicit parameters and to parallelize processing.
  • Users need to provide the test they want to use after permutation themselves, if they don't want to use the standard WilcoxonTest.

Additional context

Part of the scverse x owkin hackathon.

maltekuehl avatar Mar 18 '25 11:03 maltekuehl

Cool! Let me know when you want me or Gregor to have a look, please.

Zethson avatar Mar 18 '25 14:03 Zethson

@Zethson @grst based on the tests I ran locally, this version should now pass. However, there seems to be a docs issue unrelated to my code changes (see other recent PRs), because a dependency cannot be installed. So from my side, you can go ahead and take a look already and let me know if anything needs to be adapted.

Another idea that I had was that it would be interesting to be able to compare any values in obs and obsm as well. One use case would be if you have a spatial transcriptomics image for each sample within a group for which you can calculate Moran's I at the sample level (for each gene or a single gene of interest). You may want to store this data not in its own pdata but rather in the metadata, so flexibilizing the compare_groups function to not be restricted to pdata.var_names for the variables that are compared would be a nice addition.

maltekuehl avatar Mar 18 '25 14:03 maltekuehl

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 72.68%. Comparing base (28b8291) to head (52d2d58).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #726      +/-   ##
==========================================
- Coverage   72.74%   72.68%   -0.06%     
==========================================
  Files          47       47              
  Lines        5510     5517       +7     
==========================================
+ Hits         4008     4010       +2     
- Misses       1502     1507       +5     
Files with missing lines Coverage Δ
pertpy/tools/__init__.py 77.77% <100.00%> (ø)
...py/tools/_differential_gene_expression/__init__.py 92.30% <100.00%> (-0.29%) :arrow_down:
...y/tools/_differential_gene_expression/_pydeseq2.py 91.89% <100.00%> (ø)
...ols/_differential_gene_expression/_simple_tests.py 97.77% <100.00%> (+0.21%) :arrow_up:

... and 2 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Mar 18 '25 14:03 codecov-commenter

If you merge main into this, the RTD job will work again.

Zethson avatar Mar 18 '25 15:03 Zethson

@maltekuehl, could the paralellization you implemented be pushed up to the abstract base class? Then also wilcoxon and ttest would benefit from it. In that case, would it even be necessary to re-implement the interfaces specifically for the permutation test, or could you just use an extension class?

grst avatar Mar 18 '25 15:03 grst

@grst good idea to push this upstream. The reason I had to recreate the compare_groups function was that I wanted to explicitly expose the seed, test and n_permutations parameters, as these are key to how the permutation test works and should imo not just be passed through test_kwargs without further documentation. We could however add these as unsued parameters for the other classes to the base class or we could move the functionality of compare_groups to a helper function and then just overwrite the call to that helper with an update of the test_kwargs. This would however mean that we would have to have essentially the same parameters for both the function and the helper, leading perhaps to unnecessary code duplication. What are your thoughts?

maltekuehl avatar Mar 18 '25 15:03 maltekuehl

@Zethson what would you suggest naming wise? The docs mention a pertpy.tools.PermutationTest but this does not actually seem to be implemented in the code, hence I went with this name. However, the docs should be updated, but that relates to the distance functionality which you are more familiar with. The docstring for DistanceTest also mentions permutation tests but that could perhaps be rephrased to "Monte-Carlo simulation" or be specified in some other way? For now, I however think that outside the non-existing (or only now created) function in the docs that there is little potential for confusion.

maltekuehl avatar Mar 18 '25 16:03 maltekuehl

@grst @maltekuehl what's the status of this PR now?

Zethson avatar Apr 28 '25 00:04 Zethson

I still need to give it a proper review, didn't have the time yet

grst avatar Apr 28 '25 05:04 grst

@Zethson from my side all of your comments are addressed but waiting on the comments from @grst. Hadnt checked back because I'm on holiday but can make any necessary changes next week.

maltekuehl avatar Apr 28 '25 12:04 maltekuehl

Had to see @grst in person to remember this PR, now all comments should have been addressed. I greatly simplified everything by focussing on callables only, not wrapping other simple tests. Basically just a small wrapper around the scipy version now. Moved the extra arguments to the permutation test itself and out of the base class. Test failing in 3.13 --pre is unrelated to this PR and on main, too.

maltekuehl avatar Sep 11 '25 18:09 maltekuehl

Looking this weekend. Sorry for the delay.

Zethson avatar Oct 02 '25 11:10 Zethson