pycytominer
pycytominer copied to clipboard
Collate and SingleCells to accept less/different compartments
Description
I'm proposing these changes to Collate and SingleCells for those to accept different numbers of cell compartments or even just one compartment. Related to issue #272
This is still a work in progress.
So far, (1) I've added the option on collate.py to accept 3 flags, for no-cells, no-cytoplasm, or no-nuclei; (2) the checking on assert_linking_cols_complete
only happens when there's more than one compartment.
Proposed changes/discussion
From some discussion with @bethac07, we saw two options:
-
The first option is just to add more documentation to SingleCells (which I did), and expect the user to build their dictionary and provide it as
compartment_linking_cols
. If that's how you'd like to do it, I think the changes I already did would be enough to merge. -
OR build the dictionary for
compartment_linking_cols
based on the compartments given:
Now: the compartment_linking_cols
is defined as the default_linking_cols
if no dictionary is specified by the user. Also, for SingleCells to work right now with only one compartment, you must give a dictionary linking the compartment to itself, for example: { "cells": {"cells": "ObjectNumber"}}
, which works, but I don't know if that's the right way to do it.
So, to build the dictionary:
- [ ] Have some sort of template dictionary like
{"compA": {"parent":{"compB":"object_Parent_compB"}, "child":{"compA":"ObjectNumber"}}}
, where it will take the user compartments and create this dictionary. I just don't know which is the best way to handle this, because how do we know which is the child-parent from objects other than nuclei, cell, and cytoplasm, if the user is not giving that info? - [ ] To deal only with nuclei, cytoplasm, and cells objects combinations (if the user wants only one or two of those compartments), have a dictionary like
par_child_dict = {"nuclei": {"cells", "cytoplasm"}, "cells":{"cytoplasm"}}
from where the relationship can be inferred and thecompartment_linking_cols
is built based on par_child_dict. - [ ] On merge_single_cells, something needs to change on where the dataframe
sc_df
is being merged to work with only one compartment without the need to provide a dictionary that links the compartment to itself.
What is the nature of your change?
- [ ] Bug fix (fixes an issue).
- [x] Enhancement (adds functionality).
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected).
- [x] This change requires a documentation update.
Checklist
- [x] I have read the CONTRIBUTING.md guidelines.
- [x] My code follows the style guidelines of this project.
- [x] I have performed a self-review of my own code.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [x] I have made corresponding changes to the documentation.
- [x] My changes generate no new warnings.
- [ ] New and existing unit tests pass locally with my changes.
- [ ] I have added tests that prove my fix is effective or that my feature works.
- [x] I have deleted all non-relevant text in this pull request template.