cellassign icon indicating copy to clipboard operation
cellassign copied to clipboard

imbalanced marker lists

Open kmh005 opened this issue 4 years ago • 2 comments

Hi, Great tool! We're really pleased with its performance. Would recommend adding the FC cutoff in the basic example as well. Can you explain how the algorithm handles imbalanced marker list sizes, or if it's preferable to randomly sample to the same N of markers? We obviously get different results on the full 120v50 and 40v40, so we'd like your opinion on the appropriate choice for your algorithm. Thanks, kmh

kmh005 avatar Dec 07 '20 19:12 kmh005

Interesting question and I'm not sure we tested this fully. There'll be a trade-off in terms of the more markers the better for assignment, but if the data you're assigning differs substantially from the data on which you derived the marker genes, then a smaller set of robust markers may be preferable. Would recommend plotting the marker expression in terms of cell type after fitting, to see if there are any obviously troubling genes in either case if that makes sense.

kieranrcampbell avatar Dec 08 '20 15:12 kieranrcampbell

Thanks for your response. We're using DGE genes stratified by logFC as derived from the actual data to determine class as a proof-of-concept. Sub-sampling to the smaller class (N=40) gives a drastically different classification than the imbalanced up/down. Your idea of the most robust markers makes sense, we've tried by logFC and by padj (N=20). However, using the imbalanced classes result is the most intriguing, so we were hoping the math had been tested against that scenario.

kmh005 avatar Dec 08 '20 18:12 kmh005