kBET icon indicating copy to clipboard operation
kBET copied to clipboard

Improve runtime of kBET

Open mbuttner opened this issue 2 years ago • 5 comments

kBET is slow and partly because it's running many computations multiple times (for instance, to obtain good stats for the rejection rate).

  • [ ] ensure that neighbourhoods are computed at most once
  • [ ] revisit the subsampling implementation
  • [ ] use a more efficient kNN computation (FNN at the moment)

mbuttner avatar Apr 29 '22 08:04 mbuttner

Hi there,

First of all thank you for kBET, very useful tool. I am trying to kBET to assess the integration quality of single cell samples (processed with Seurat). First question : in this case, the batch number would be the number of cells in my integrated objects, OR the number of samples (ie. "stimulated", "non stimulated") ? I used the recommanded lines (separating knn computation from kBET function) unfortunaltely the running times are huge. My object is 17k cells x 20k genes ? Would you advise me to randomly subset my data before getting to kBET ?

Thank you in advance for you help,

Best, Lilia

liliay avatar Nov 09 '22 15:11 liliay

Hi @liliay

thank you for trying kBET.

  1. I would use the batch label of the cells, not the condition.
  2. About the runtime: I recommend to reduce the number of initial dimensions. You can compute a PCA on the data and use only the first 50 PCs, or in case you have integrated the data with Seurat, take the embedding space as input. This should be on much lower dimension. Random subsampling might not be necessary.

Best, Maren

mbuttner avatar Nov 10 '22 15:11 mbuttner

Just wanted to plug our extremely fast python version of kbet

https://github.com/YosefLab/scib-metrics/pull/60

It will be in this package soon. It does not have all the same functionality (no bootstrapping currently), but these things should not be difficult to add.

adamgayoso avatar Dec 10 '22 01:12 adamgayoso

@adamgayoso

Thanks for sharing this! Your code looks quite neat and it is fantastic so learn about the speed-up. Did you also include an estimate on the neighborhood size? I might have missed in the code.

mbuttner avatar Dec 20 '22 12:12 mbuttner

we did not as it seemed the original scib package used a fixed k

https://github.com/theislab/scib/blob/da9c39b89b95b2ec34b6f547445e931571120ba6/scib/metrics/kbet.py#L144-L151

adamgayoso avatar Dec 20 '22 16:12 adamgayoso