pycytominer
pycytominer copied to clipboard
Collate.py fails with less number of compartments
Issue: not having all compartments affects SingleCells and collate.py
Sometimes we don't have all the compartments in an assay, for example having only stained the cells with a dye, and not the nuclei or cytoplasm. That was my case, and I used Distributed-CellProfiler for analysis and obtained the CSVs only for the Cells object.
Then, when trying to run backends, this error pops up because I only had Cells and it's trying to get all compartments for collate and ingest:
Error: in prepare, no such table: main.Cytoplasm was generated when running sqlite3 /home/ubuntu/ebs_tmp/backend/2022_03_24_Acidification_LT/211021_102106_Plate_1/211021_102106_Plate_1.sqlite CREATE INDEX IF NOT EXISTS table_image_object_cytoplasm_idx ON Cytoplasm(TableNumber, ImageNumber, ObjectNumber);. Exiting.
Changes
To accept different numbers of compartments or even just one compartment, I'm proposing where the changes would be. I didn't run tests yet, so posting here to see if someone has any ideas.
-
[x] Create a new argument called
--compartments
to be given to collate_cmd; -
[x] In collate.py, add the new argument "compartments" with ["Cells", "Nuclei", "Cytoplasm"] as default;
-
[x] In collate.py, replace this line with the following to accept different number of compartments:
include_list = []
for eachcompartment in compartments:
include = "--include */" + eachcompartment + ".csv"
include_list.append(include)
sync_cmd = f"aws s3 sync --exclude * {(' '.join(include_list))} --include */Image.csv {remote_input_dir} {input_dir}"
-
[x] In collate.py, change here to
for eachcompartment in compartments
instead of the specified list; -
[ ] Because collate.py is calling SingleCells' function to aggregate profiles, we would also need to provide the specific dictionary here depending on the compartments. Maybe change the
get_default_linking_cols
function to generate the dictionary based on the compartments given instead of asking the person to specify "compartment_linking_cols" inside SingleCells. (Have pre-defined columns' names for the three objects and combine them into one dictionary, based on the hierarchical organization that Nuclei can be a parent for Cells and Cytoplasm, and Cells can be a parent of Cytoplasm).nuclei_parent = ["Cells_Parent_Nuclei", "Cytoplasm_Parent_Nuclei"] cells_parent = ["Cytoplasm_Parent_Cells"] cells_children = ["Cells_Parent_Nuclei"]
So based on those lists and doing lots of if's and for's, create a dictionary. Not sure how to do that or if that's the best solution! Something similar was mentioned in #216
-
[ ] Figure out how to get aggregate_profiles for only one compartment. Is there a way to skip
get_default_linking_cols
iflen(compartments) == 1
?