pycytominer icon indicating copy to clipboard operation
pycytominer copied to clipboard

Collate and SingleCells to accept less/different compartments

Open fefossa opened this issue 1 year ago • 1 comments

Description

I'm proposing these changes to Collate and SingleCells for those to accept different numbers of cell compartments or even just one compartment. Related to issue #272

This is still a work in progress.

So far, (1) I've added the option on collate.py to accept 3 flags, for no-cells, no-cytoplasm, or no-nuclei; (2) the checking on assert_linking_cols_complete only happens when there's more than one compartment.

Proposed changes/discussion

From some discussion with @bethac07, we saw two options:

  1. The first option is just to add more documentation to SingleCells (which I did), and expect the user to build their dictionary and provide it as compartment_linking_cols. If that's how you'd like to do it, I think the changes I already did would be enough to merge.

  2. OR build the dictionary for compartment_linking_cols based on the compartments given:

Now: the compartment_linking_cols is defined as the default_linking_cols if no dictionary is specified by the user. Also, for SingleCells to work right now with only one compartment, you must give a dictionary linking the compartment to itself, for example: { "cells": {"cells": "ObjectNumber"}}, which works, but I don't know if that's the right way to do it.

So, to build the dictionary:

  • [ ] Have some sort of template dictionary like {"compA": {"parent":{"compB":"object_Parent_compB"}, "child":{"compA":"ObjectNumber"}}}, where it will take the user compartments and create this dictionary. I just don't know which is the best way to handle this, because how do we know which is the child-parent from objects other than nuclei, cell, and cytoplasm, if the user is not giving that info?
  • [ ] To deal only with nuclei, cytoplasm, and cells objects combinations (if the user wants only one or two of those compartments), have a dictionary like par_child_dict = {"nuclei": {"cells", "cytoplasm"}, "cells":{"cytoplasm"}} from where the relationship can be inferred and the compartment_linking_cols is built based on par_child_dict.
  • [ ] On merge_single_cells, something needs to change on where the dataframe sc_df is being merged to work with only one compartment without the need to provide a dictionary that links the compartment to itself.

What is the nature of your change?

  • [ ] Bug fix (fixes an issue).
  • [x] Enhancement (adds functionality).
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • [x] This change requires a documentation update.

Checklist

  • [x] I have read the CONTRIBUTING.md guidelines.
  • [x] My code follows the style guidelines of this project.
  • [x] I have performed a self-review of my own code.
  • [x] I have commented my code, particularly in hard-to-understand areas.
  • [x] I have made corresponding changes to the documentation.
  • [x] My changes generate no new warnings.
  • [ ] New and existing unit tests pass locally with my changes.
  • [ ] I have added tests that prove my fix is effective or that my feature works.
  • [x] I have deleted all non-relevant text in this pull request template.

fefossa avatar Jun 26 '23 13:06 fefossa