scanpy icon indicating copy to clipboard operation
scanpy copied to clipboard

Submodular optimization using apricot

Open timtreis opened this issue 2 years ago • 3 comments

What kind of feature would you like to request?

New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?

Please describe your wishes

I'm currently implementing a function that takes in an anndata and then subsamples a given representation using https://github.com/jmschrei/apricot. This generally serves the purpose of semi-optimally picking a reduced number of points that's still representative of the latent space.

Is this sth within the scope of scanpy? When it's done it wouldn't be too much effort to polish it up for a PR. The dependency load seems fairly low.

timtreis avatar Feb 16 '24 14:02 timtreis

The idea of having "smart subsample" functionality available in scanpy has been a topic of discussion for a while.

I would like to see a benchmark of these methods on single cell data before choosing one to include here. Are you aware of anything in this space?

Update:

It looks like the lab it's from have put out some writing on this: https://dl.acm.org/doi/pdf/10.1145/3388440.3412409

ivirshup avatar Feb 16 '24 15:02 ivirshup

Joining on the "smart subsample" part which we talked about a few months ago.

The under/oversampling methods of imbalanced-learn was something we chatted about back then I remember.

I opened a small dummy scverse-package draft here just to check how well this can be transferred for AnnData, but never really got to push it much.

Not sure if we somehow could find ground to join on making something for stuff like that?

eroell avatar Feb 29 '24 15:02 eroell

cc: @mumichae about our conversation the other day.

I've been thinking that a good entry point here could just be a notebook that demonstrates using these packages on single cell data that we could point to on https://scverse.org/learn (hosted on https://github.com/scverse/scverse-tutorials). This could be a good starting point for anyone who wants to jump in to investigate further.

And, for completeness, I would also want to point out https://github.com/brianhie/geosketch as another promising subsampling method.

ivirshup avatar Mar 28 '24 10:03 ivirshup