pycytominer icon indicating copy to clipboard operation
pycytominer copied to clipboard

Collate.py fails with less number of compartments

Open fefossa opened this issue 1 year ago • 4 comments

Issue: not having all compartments affects SingleCells and collate.py

Sometimes we don't have all the compartments in an assay, for example having only stained the cells with a dye, and not the nuclei or cytoplasm. That was my case, and I used Distributed-CellProfiler for analysis and obtained the CSVs only for the Cells object.

Then, when trying to run backends, this error pops up because I only had Cells and it's trying to get all compartments for collate and ingest: Error: in prepare, no such table: main.Cytoplasm was generated when running sqlite3 /home/ubuntu/ebs_tmp/backend/2022_03_24_Acidification_LT/211021_102106_Plate_1/211021_102106_Plate_1.sqlite CREATE INDEX IF NOT EXISTS table_image_object_cytoplasm_idx ON Cytoplasm(TableNumber, ImageNumber, ObjectNumber);. Exiting.

Changes

To accept different numbers of compartments or even just one compartment, I'm proposing where the changes would be. I didn't run tests yet, so posting here to see if someone has any ideas.

  • [x] Create a new argument called --compartments to be given to collate_cmd;

  • [x] In collate.py, add the new argument "compartments" with ["Cells", "Nuclei", "Cytoplasm"] as default;

  • [x] In collate.py, replace this line with the following to accept different number of compartments:

include_list = []
for eachcompartment in compartments:
    include = "--include */" + eachcompartment + ".csv"
    include_list.append(include)
sync_cmd = f"aws s3 sync --exclude * {(' '.join(include_list))} --include */Image.csv {remote_input_dir} {input_dir}"
  • [x] In collate.py, change here to for eachcompartment in compartments instead of the specified list;

  • [ ] Because collate.py is calling SingleCells' function to aggregate profiles, we would also need to provide the specific dictionary here depending on the compartments. Maybe change the get_default_linking_cols function to generate the dictionary based on the compartments given instead of asking the person to specify "compartment_linking_cols" inside SingleCells. (Have pre-defined columns' names for the three objects and combine them into one dictionary, based on the hierarchical organization that Nuclei can be a parent for Cells and Cytoplasm, and Cells can be a parent of Cytoplasm).

    nuclei_parent = ["Cells_Parent_Nuclei", "Cytoplasm_Parent_Nuclei"]
    cells_parent = ["Cytoplasm_Parent_Cells"]
    cells_children = ["Cells_Parent_Nuclei"]
    

    So based on those lists and doing lots of if's and for's, create a dictionary. Not sure how to do that or if that's the best solution! Something similar was mentioned in #216

  • [ ] Figure out how to get aggregate_profiles for only one compartment. Is there a way to skip get_default_linking_cols if len(compartments) == 1?

fefossa avatar Apr 14 '23 16:04 fefossa