scMiko icon indicating copy to clipboard operation
scMiko copied to clipboard

Silhouette scores calculated using UMAP

Open Lucas-Maciel opened this issue 8 months ago • 0 comments

Hi,

I recently found your package which will help make my scripts shorter, but I realized that the silhouette values obtained by you were different from the ones I got. Checking your function I saw that you only use the 2 UMAP dimensions to calculate the distance.

 df.umap <- getUMAP(object)[["df.umap"]]
 umap.dist <- dist(x = (df.umap[, c("x", "y")]), method = "euclidean", diag = FALSE, upper = FALSE, p = 2)
...
  sil <- cluster::silhouette(x = clust.mem, dist = umap.dist)

In codes and functions I have seen in the past they use the PCAs, instead of UMAP. Here two examples: https://rdrr.io/github/jr-leary7/YehLabClust/src/R/ComputeSilhouetteScores.R https://bioinformatics-core-shared-training.github.io/UnivCambridge_ScRnaSeq_Nov2021/Markdowns/08_ClusteringPostDsi.html#1212_Separatedness

In my own code, I use the harmony dimensions, as I'm integrating datasets

dimensions <- 1:15
pc.dist <- dist(x = Embeddings(object = seu[["harmony"]])[, dimensions])
sil <- silhouette(x = as.numeric([email protected]$seurat_clusters), dist = pc.dist)

For a given resolution, for example, with my code I get score of 0.4 and with scMiko 0.63. So I was wondering which one is the most appropriate to do. Thank you

Lucas-Maciel avatar Oct 18 '23 11:10 Lucas-Maciel