topometry icon indicating copy to clipboard operation
topometry copied to clipboard

library integration

Open jdmontenegro opened this issue 6 months ago • 0 comments

Hi, I have recently started to play with single cell data analysis and your biorxiv paper and approach sounds really interesting. I understand how PCA dimensional reduction is probably the wrong assumption about the topology of the underlying data. In that sense, I was curious how do you handle integration of multiple datasets. Traditionally, this integration is based on common variable PCs (eigenvectors), but those are selected assuming the same underlying topology. In your package, I can see that if we have biological replicates, we could select the same topology for two libraries and then select the most variable eigenvectors for integration, but what happens if the biological replicates use different library prep which introduce some kind of batch effect? That batch effect would influence the selection of the best fitting topology for the data and could make it difficult if not impossible to integrate datasets that should have a shared underlying biology and composition.

My question is, how can we handle these scenarios with Topometry? How to best select eigenvectors for integration of multiple datasets and how do we prevent sampling methods from introducing batch artifacts into the model? These may be naive questions, but I would like to understand your take on these. Best regards,

jdmontenegro avatar Dec 19 '23 13:12 jdmontenegro