BPCells icon indicating copy to clipboard operation
BPCells copied to clipboard

Comparison with Scarf

Open parashardhapola opened this issue 2 years ago • 2 comments

Hi Benjamin,

I wanted to applaud your exceptional work on BPCells! It's truly remarkable and will make a significant impact in the world of single-cell analysis.

I hope the manuscript writing is progressing smoothly for you.

I also wanted to bring your attention to Scarf, a Python package for single-cell data analysis. It shares many of the objectives as BPCells, and I believe it would be incredibly interesting to conduct a head-to-head comparison between the two. Scarf was published in Nature Communication (Aug 2022) and went through multiple benchmarks against Scanpy and Seurat, using various parameters.

Best regards, Parashar

parashardhapola avatar Sep 19 '23 17:09 parashardhapola

Hi Parashar,

Great to hear from you! Yes, I've seen Scarf and although I haven't had many chances to use it personally, it seems like a nice direction. If my memory serves correctly, you focused more on KNN calculation, UMAP, and clustering (i.e. post-PCA steps) than BPCells, which has focused more on normalization + PCA steps, storage optimization, and scATAC-seq fragment-level processing. I think these are both important areas to improve the single cell analysis ecosystem, so if you're interested implementing some interoperability I'd be happy to work with you on that so users can more easily take advantage of the strengths of each library.

In terms of benchmarking, my current plan would be to follow a similar strategy of the Scarf paper and focus on benchmarks with Seurat + Scanpy. That said, if you'd be open to providing some demo scripts then I'd be happy to run those comparisons on the rest of my benchmark datasets. If the results are competitive, representative of what a typical user might achieve, and ready in time for the manuscript submission then I'd be happy to include them.

The main challenge I'd anticipate for you would be getting identical end-to-end results to within reasonable numerical accuracy on the benchmarking tasks while skipping any unrelated work that would slow down processing unfairly. Ideally, this would also only make use of the public APIs. If you're up for giving a go at this, I can provide a small demo dataset and scanpy scripts to reproduce. I can reach out to your @med.lu.se email address if you still have access to that.

If this sounds like something interesting to you, I'd be happy to coordinate the details over email.

-Ben

bnprks avatar Sep 20 '23 17:09 bnprks

Thanks for the reply, Ben!

I'd be happy to chip in. I can try to "transpile" your Scanpy scripts into Scarf equivalent. If that works, then it should be plug-and-play for your workflow.

Please use my new email: [email protected]

/Parashar

parashardhapola avatar Sep 20 '23 19:09 parashardhapola