moscot Pull/push in a batch-wise fashion

Hi there! For a simple TemporalProblem, I've held out some genes (from the embedding computation, simple PCA) and computed the coupling. I would now like to use the coupling to predict expression values of the held out genes (at either t_1 or t_2, both possible), as a means of validation. However, when calling tp.push(source=8.0, target=8.5, data=gexp_sc, scale_by_marginals=True), where gexp_sc is the gene expression matrix of held-out genes on the source cells, my kernel dies. I assume that's because the matrix multiplication is carried out using a dense formulation, all at once. Is it somehow possible to do this in a batch-wise fashion, i.e. by only loading small chunks of the coupling into memory at once?

Aug 11 '23 12:08 Marius1311

hi @Marius1311 ! I think #559 is related and there are some possible solutions, let us know if it works!

Aug 11 '23 15:08 giovp

Great, thanks @giovp! I guess this is also related to https://github.com/theislab/moscot/issues/569.

A solution that works for me is specifying the batch_size=x in the problem's solve method, even though that's not actually required to solve the problem as it's quite small. However, that seems to imply that downstream computations are also batched, I can run

out = tp.push(source=8.0, target=8.5, data=gexp_src, scale_by_marginals=True, return_all=True, key_added=None)

now fine without any issues. However, this is a bit clumsy, as it requires me to solve the problem in a (slower) batch-wise fashion, even though I could solve it in offline mode. Thus, I think it would be nice to decouple the two batch_sizes, to allow a problem to be solved using some batch size, and to use pull/push downstream with another batch size.

Aug 15 '23 11:08 Marius1311

sorry, partly unrelated - if I want to impute gene expression at the target using the source, would I have to use scale_by_marginals? Intuitively, I would say no, as all I want is Y = P^T X, where P is the coupling, X is known gene expression in the source, and Y is my unknown gene expression in the target. So I just want this matrix multiplication, with no additional scaling.

Aug 15 '23 11:08 Marius1311

moscot moscot copied to clipboard

Pull/push in a batch-wise fashion

moscot
moscot copied to clipboard