unsupervised_analysis icon indicating copy to clipboard operation
unsupervised_analysis copied to clipboard

clustification v1

Open sreichl opened this issue 1 year ago • 1 comments

Collect below all issues related to the proposed ML-based clustering approach: clustification

  • [ ] Plan below's tasks
  • [ ] #28
  • [ ] #47
  • [ ] #48
  • [ ] #12

sreichl avatar Jun 22 '24 13:06 sreichl

https://www.nature.com/articles/s41588-025-02148-8 Bascially the same idea as clustification made it into Nature Genetics as methods paper. Overcluster, then merge if a random forest cannot separate them. Don’t use misclassification but label shuffling and p-values, and have a few extra tricks. Validation for example in simulated data with gaussian blobs, and then with real data where there are 150 cancer cell lines and CHOIR is good in separating them. Cool experiments: Show that why they misclassify some cell lines, e.g., based on proliferation scores. Also downsample clusters and show that choir still finds these separately even though they have only 50 cells, while other methods miss them (so it can do multiple resolutions).

sreichl avatar May 09 '25 14:05 sreichl