scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

sc.get.aggregate summary statistics

Open grst opened this issue 1 year ago • 4 comments

What kind of feature would you like to request?

Additional function parameters / changed functionality / changed defaults?

Please describe your wishes

It would be nice if sc.get.aggregate provided a way to compute summary statistics and put them in .obs of the aggregated AnnData object.

Most important would be for me to store the number of cells per aggregated sample, for being able to filter out samples below a certain threshold.

Not even sure if there are other metrics that are relevant, but in the most general case it would take a callback function.

grst avatar May 16 '24 06:05 grst

Sounds like a useful feature, you wanna do a PR?

flying-sheep avatar Jun 07 '24 10:06 flying-sheep

I don't think I would have the time in the near future

grst avatar Jun 07 '24 15:06 grst

It would also be nice to add some stats for vars. Let's say we have aggregated cells from control and stim samples. For example, the percentage of cells expressing gene A in clusterX for every group sample would allow us to filter genes expressed in so few cells but with relatively high counts. It will reduce the false positives in pseudobulk differential gene expression analysis caused by these genes.

osmanmerdan avatar Jul 17 '24 20:07 osmanmerdan

Hi @osmanmerdan ,

this should be possible already now by using

sc.get.aggregate(..., func="count_nonzero")

grst avatar Jul 18 '24 07:07 grst