kBET
kBET copied to clipboard
Improve runtime of kBET
kBET is slow and partly because it's running many computations multiple times (for instance, to obtain good stats for the rejection rate).
- [ ] ensure that neighbourhoods are computed at most once
- [ ] revisit the subsampling implementation
- [ ] use a more efficient kNN computation (FNN at the moment)
Hi there,
First of all thank you for kBET, very useful tool. I am trying to kBET to assess the integration quality of single cell samples (processed with Seurat). First question : in this case, the batch number would be the number of cells in my integrated objects, OR the number of samples (ie. "stimulated", "non stimulated") ? I used the recommanded lines (separating knn computation from kBET function) unfortunaltely the running times are huge. My object is 17k cells x 20k genes ? Would you advise me to randomly subset my data before getting to kBET ?
Thank you in advance for you help,
Best, Lilia
Hi @liliay
thank you for trying kBET.
- I would use the batch label of the cells, not the condition.
- About the runtime: I recommend to reduce the number of initial dimensions. You can compute a PCA on the data and use only the first 50 PCs, or in case you have integrated the data with Seurat, take the embedding space as input. This should be on much lower dimension. Random subsampling might not be necessary.
Best, Maren
Just wanted to plug our extremely fast python version of kbet
https://github.com/YosefLab/scib-metrics/pull/60
It will be in this package soon. It does not have all the same functionality (no bootstrapping currently), but these things should not be difficult to add.
@adamgayoso
Thanks for sharing this! Your code looks quite neat and it is fantastic so learn about the speed-up. Did you also include an estimate on the neighborhood size? I might have missed in the code.
we did not as it seemed the original scib package used a fixed k
https://github.com/theislab/scib/blob/da9c39b89b95b2ec34b6f547445e931571120ba6/scib/metrics/kbet.py#L144-L151