fix categories import
Possible solution to Issue #1009
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 92.08%. Comparing base (79cf8c9) to head (8425f34).
Additional details and impacted files
@@ Coverage Diff @@
## main #1010 +/- ##
=======================================
Coverage 92.08% 92.08%
=======================================
Files 48 48
Lines 7446 7446
=======================================
Hits 6857 6857
Misses 589 589
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/spatialdata/models/models.py | 88.61% <100.00%> (ø) |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Hi @jonas2612, Thanks for the PR. However, before I look further into this, could you please check whether the problem persists when you use the PR that unpins dask, #1006? I had a different issue with partitioning, but it could be connected so want to see if it would be fixed.
Also, in general, cat.as_known in the newer dask versions does not necessarily preserve the order anymore, so it might be better to wait with this PR a tiny bit as we expect to have the unpinning dask PR reviewed soon. It would then be better to branch of from that point given what I just mentioned.
Hi @melonora. Yes, of course. I'll check it today and get back to you
Cheers, please also check this https://github.com/melonora/spatialdata/blob/e017ca7d6107623750196606a07fe8e4407c242f/src/spatialdata/_core/operations/rasterize.py#L674-L677.
This might provide some context for what I mentioned in my message above.
That indeed looks very similar to the bug I get.
Still, I checked it with your PR too, but the error still exists:
Here, I expect either 500 or 550 genes, but only receive 17.
I understand the difficulty, although do not understand the background well, why the ordering of the categories is important. But if it's better for you, I can branch off and redo the change at a later time point after the PR is reviewed
@jonas2612, the reason why it is important is because as_known is basically reassigning a column. If the order is changed, a particular category belonging to a certain index can have changed.