ark-analysis icon indicating copy to clipboard operation
ark-analysis copied to clipboard

Allow for a subset of FOVs in k-means clustering notebook

Open cliu72 opened this issue 1 year ago • 1 comments

Describe the bug There is no option to choose a subset of FOVs in the k-means neighborhood notebook. Discovered by Avery.

Currently, the notebook gets all FOVs in the cell table (all_fovs = all_data[settings.FOV_ID].unique() in the notebook), then uses all FOVs in the segmentation directory to calculate the distance matrix (https://github.com/angelolab/ark-analysis/blob/main/src/ark/analysis/spatial_analysis_utils.py#L37). If you manually change all_fovs in the notebook to try to run k-means only on a subset of FOVs, it errors out.

Expected behavior Allow users to choose a subset of FOVs to run k-means on.

To Reproduce Change all_fovs in the kmeans notebook to be a subset of FOVs.

cliu72 avatar Mar 07 '24 03:03 cliu72

If we do this, a neighbors matrix will be generated and saved based on the provided subset of cells, which could potentially cause issues in other spatial scripts. It likely makes more sense to generate the distance matrices and neighbors matrix for the full data, and then just subset the neighbors data to input to k-means!

camisowers avatar Mar 07 '24 22:03 camisowers