mudata icon indicating copy to clipboard operation
mudata copied to clipboard

Fixes for the new update()

Open ilia-kats opened this issue 2 months ago • 4 comments

  • add unit tests for the new update()
  • remove redundant/obsolete code and streamline the new update()
  • fix bugs and make it pass all tests, in particular with duplicated index entries and different axes

ilia-kats avatar Sep 30 '25 09:09 ilia-kats

Thanks, @ilia-kats! I especially appreciate the code reduction — it's a great direction, and maybe we can eliminate even more code in the .update() functionality to make it simpler in future.

Because it's a difficult part of the codebase to track the logic across so many lines, if you think there are more comments explaining the logic that can help this to be more maintainable, feel free to add them!

The only question before we merge it is about potential performance regressions with the changes to the .update() details: do you know if the execution time was impacted by any of the changes?

gtca avatar Oct 09 '25 23:10 gtca

I haven't measured performance. On the one hand, I would expect it to be faster since I replaced this entire code block at the bottom of the function with a single call to data_global.index.get_indexer() further up, but on the other hand it now does a full comparison of the modmaps for some modalities, which will be slower. However, that was the only way I could think of to make all the tests pass (i.e. the previous version was buggy and didn't work in some cases), so I don't think we have much choice there. But I'll be happy to run some benchmarks if you can point me to a particularly nasty dataset performance-wise, I've never had issues with update() performance myself.

ilia-kats avatar Oct 10 '25 06:10 ilia-kats

fix bugs and make it pass all tests, in particular with duplicated index entries and different axes

Can you highlight which lines fix which bugs?

I'm afraid not for everything. The previous version was not being tested in the unit tests, so I just enabled the tests and tried to make them all pass. This particularly concerns the first commit in this series. The others are more incremental, where one commit introduces some tests and makes them pass. I can tell you that these changes fix the case where there are duplicate indices and stuff in obsm/varm: The previous version (the big code block that I removed) attempted to reconstruct the mapping from old indices to new indices from scratch and was buggy. We have this information at the point where we join data_mod with data_global, but it is then discarded as data_mod is prepared to become the new .obs/.var. So now I just get the mapping from data_mod and data_global.

ilia-kats avatar Oct 29 '25 15:10 ilia-kats

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 98.12%. Comparing base (0d646ed) to head (b0091b2). :warning: Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
+ Coverage   96.78%   98.12%   +1.33%     
==========================================
  Files          13       13              
  Lines         872     1011     +139     
==========================================
+ Hits          844      992     +148     
+ Misses         28       19       -9     
Files with missing lines Coverage Δ
tests/test_update.py 99.23% <100.00%> (+8.11%) :arrow_up:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 03 '25 12:11 codecov[bot]