spatialdata icon indicating copy to clipboard operation
spatialdata copied to clipboard

fix categories import

Open jonas2612 opened this issue 2 months ago • 6 comments

Possible solution to Issue #1009

jonas2612 avatar Oct 30 '25 08:10 jonas2612

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 92.08%. Comparing base (79cf8c9) to head (8425f34).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1010   +/-   ##
=======================================
  Coverage   92.08%   92.08%           
=======================================
  Files          48       48           
  Lines        7446     7446           
=======================================
  Hits         6857     6857           
  Misses        589      589           
Files with missing lines Coverage Δ
src/spatialdata/models/models.py 88.61% <100.00%> (ø)
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 30 '25 08:10 codecov[bot]

Hi @jonas2612, Thanks for the PR. However, before I look further into this, could you please check whether the problem persists when you use the PR that unpins dask, #1006? I had a different issue with partitioning, but it could be connected so want to see if it would be fixed.

Also, in general, cat.as_known in the newer dask versions does not necessarily preserve the order anymore, so it might be better to wait with this PR a tiny bit as we expect to have the unpinning dask PR reviewed soon. It would then be better to branch of from that point given what I just mentioned.

melonora avatar Oct 30 '25 08:10 melonora

Hi @melonora. Yes, of course. I'll check it today and get back to you

jonas2612 avatar Oct 30 '25 09:10 jonas2612

Cheers, please also check this https://github.com/melonora/spatialdata/blob/e017ca7d6107623750196606a07fe8e4407c242f/src/spatialdata/_core/operations/rasterize.py#L674-L677.

This might provide some context for what I mentioned in my message above.

melonora avatar Oct 30 '25 09:10 melonora

That indeed looks very similar to the bug I get. Still, I checked it with your PR too, but the error still exists: image Here, I expect either 500 or 550 genes, but only receive 17.

I understand the difficulty, although do not understand the background well, why the ordering of the categories is important. But if it's better for you, I can branch off and redo the change at a later time point after the PR is reviewed

jonas2612 avatar Oct 30 '25 09:10 jonas2612

@jonas2612, the reason why it is important is because as_known is basically reassigning a column. If the order is changed, a particular category belonging to a certain index can have changed.

melonora avatar Oct 30 '25 12:10 melonora