kBET
kBET copied to clipboard
kBET on Harmony or CCA integration
Hello! I am trying to use kBET on a very large integrated dataset (40.000 genes across 180.000 cells). I have few question about it.
-
Since both Harmony and CCA do not generate a batch-corrected counts, what should i pass as input for kbet? If i want to compare it with unintegrated datasets (where the counts are still the same), is it better to give as input for kBET the embeddings?
-
I am having some troubles if I want to work with the whole datasets, is there a maximum size for the input dataframe?
Thank you for your help!
Hi @martina811 thank you for your questions!
- You can run kBET on the embeddings directly. Internally kBET computes a k-nearest neighbor graph to assess batch effects.
- You can tweak kBET to run faster and potentially on your larger dataset. The fastest is usually to pass a k-nearest neighbor graph in the same structure as the FNN package would provide it, turn off any pre-processing and set a fixed neighborhood size k (see https://github.com/theislab/kBET?tab=readme-ov-file#variations)
I hope that helps!