BPCells icon indicating copy to clipboard operation
BPCells copied to clipboard

incorporate with batch-effect correction methods

Open Feilijiang opened this issue 1 year ago • 1 comments

Hi Ben,

Thank you for the awesome tool. I have recently started using it to analyze my datasets. However, I am encountering strong batch effects in my data. I was wondering if you have any plans to incorporate Harmony or other batch-effect correction methods into the tool. This feature would be incredibly helpful, especially for large datasets.

Please forgive me if I have overlooked something. Many thanks, and I look forward to your reply.

Feilijiang avatar May 31 '24 16:05 Feilijiang

Hi @Feilijiang, this is a good question!

Many batch-correction methods such as Harmony operate on the PCA matrix, not the full RNA counts matrix. Disk-backed calculations with BPCells can be quite helpful for calculating the PCA matrix, but once you have the PCA matrix disk-backed operations are usually not required and you can use tools that work fully in-memory. (The memory usage in R should be about 400MB of memory per million cells assuming 50 PCs, meaning you could handle a 20M cell dataset with just 8GB of RAM to store the PCs)

The good news here is that Harmony already accepts a PCA matrix as input. The examples in the Harmony docs show you can run harmony directly on a PCA matrix as follows:

harmony_object <- HarmonyMatrix(pca_matrix, meta_data, 'dataset',
                                    do_pca=FALSE, return_object=TRUE)

BPCells is unlikely to implement wrapper functions around Harmony since I want to keep the BPCells functionality focused on disk-backed operations, but I'd definitely consider putting up tutorials showing how to use Harmony at the end of a BPCells workflow.

In summary, I'd suggest that you do normalization + PCA using BPCells, then use the Harmony package directly for batch correction once you have the PCA matrix.

If you need help getting to a PCA, I'd suggest either following the steps in the BPCells tutorial or using Seurat's wrappers around BPCells

bnprks avatar May 31 '24 23:05 bnprks