pertpy
pertpy copied to clipboard
Add permutation test
PR Checklist
- [ ] Referenced issue is linked
- [x] If you've fixed a bug or added code that should be tested, add tests!
- [x] Documentation in
docsis updated
Description of changes
Adds a PermutationTest to the tools submodule, similar to TTest and WilcoxonTest.
Usage:
result = PermutationTest.compare_groups(
pdata,
column="group",
baseline="A",
groups_to_compare=["B"],
test=pertpy.tools.WilcoxonTest, # optional, defaults to WilcoxonTest
n_permutations=100, # optional, defaults to 100
seed=42, # optional, defaults to 0
)
Technical details
- Currently no specific reference p-values for other permutation tests to compare to exist and the standard Wilcoxon values from R deviate from the results of the PermutationTest. However, there is full agreement regarding which genes are significant and I have adapted the test to check for this.
- Needed to reimplement
compare_groupsto have the number of permutations and the test to use after permutation as explicit parameters and to parallelize processing. - Users need to provide the test they want to use after permutation themselves, if they don't want to use the standard
WilcoxonTest.
Additional context
Part of the scverse x owkin hackathon.
Cool! Let me know when you want me or Gregor to have a look, please.
@Zethson @grst based on the tests I ran locally, this version should now pass. However, there seems to be a docs issue unrelated to my code changes (see other recent PRs), because a dependency cannot be installed. So from my side, you can go ahead and take a look already and let me know if anything needs to be adapted.
Another idea that I had was that it would be interesting to be able to compare any values in obs and obsm as well. One use case would be if you have a spatial transcriptomics image for each sample within a group for which you can calculate Moran's I at the sample level (for each gene or a single gene of interest). You may want to store this data not in its own pdata but rather in the metadata, so flexibilizing the compare_groups function to not be restricted to pdata.var_names for the variables that are compared would be a nice addition.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 72.68%. Comparing base (28b8291) to head (52d2d58).
Additional details and impacted files
@@ Coverage Diff @@
## main #726 +/- ##
==========================================
- Coverage 72.74% 72.68% -0.06%
==========================================
Files 47 47
Lines 5510 5517 +7
==========================================
+ Hits 4008 4010 +2
- Misses 1502 1507 +5
| Files with missing lines | Coverage Δ | |
|---|---|---|
| pertpy/tools/__init__.py | 77.77% <100.00%> (ø) |
|
| ...py/tools/_differential_gene_expression/__init__.py | 92.30% <100.00%> (-0.29%) |
:arrow_down: |
| ...y/tools/_differential_gene_expression/_pydeseq2.py | 91.89% <100.00%> (ø) |
|
| ...ols/_differential_gene_expression/_simple_tests.py | 97.77% <100.00%> (+0.21%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
If you merge main into this, the RTD job will work again.
@maltekuehl, could the paralellization you implemented be pushed up to the abstract base class? Then also wilcoxon and ttest would benefit from it. In that case, would it even be necessary to re-implement the interfaces specifically for the permutation test, or could you just use an extension class?
@grst good idea to push this upstream. The reason I had to recreate the compare_groups function was that I wanted to explicitly expose the seed, test and n_permutations parameters, as these are key to how the permutation test works and should imo not just be passed through test_kwargs without further documentation. We could however add these as unsued parameters for the other classes to the base class or we could move the functionality of compare_groups to a helper function and then just overwrite the call to that helper with an update of the test_kwargs. This would however mean that we would have to have essentially the same parameters for both the function and the helper, leading perhaps to unnecessary code duplication. What are your thoughts?
@Zethson what would you suggest naming wise? The docs mention a pertpy.tools.PermutationTest but this does not actually seem to be implemented in the code, hence I went with this name. However, the docs should be updated, but that relates to the distance functionality which you are more familiar with. The docstring for DistanceTest also mentions permutation tests but that could perhaps be rephrased to "Monte-Carlo simulation" or be specified in some other way? For now, I however think that outside the non-existing (or only now created) function in the docs that there is little potential for confusion.
@grst @maltekuehl what's the status of this PR now?
I still need to give it a proper review, didn't have the time yet
@Zethson from my side all of your comments are addressed but waiting on the comments from @grst. Hadnt checked back because I'm on holiday but can make any necessary changes next week.
Had to see @grst in person to remember this PR, now all comments should have been addressed. I greatly simplified everything by focussing on callables only, not wrapping other simple tests. Basically just a small wrapper around the scipy version now. Moved the extra arguments to the permutation test itself and out of the base class. Test failing in 3.13 --pre is unrelated to this PR and on main, too.
Looking this weekend. Sorry for the delay.