moscot
moscot copied to clipboard
Pull/push in a batch-wise fashion
Hi there! For a simple TemporalProblem
, I've held out some genes (from the embedding computation, simple PCA) and computed the coupling. I would now like to use the coupling to predict expression values of the held out genes (at either t_1 or t_2, both possible), as a means of validation. However, when calling tp.push(source=8.0, target=8.5, data=gexp_sc, scale_by_marginals=True)
, where gexp_sc
is the gene expression matrix of held-out genes on the source cells, my kernel dies. I assume that's because the matrix multiplication is carried out using a dense formulation, all at once. Is it somehow possible to do this in a batch-wise fashion, i.e. by only loading small chunks of the coupling into memory at once?
hi @Marius1311 ! I think #559 is related and there are some possible solutions, let us know if it works!
Great, thanks @giovp! I guess this is also related to https://github.com/theislab/moscot/issues/569.
A solution that works for me is specifying the batch_size=x
in the problem's solve
method, even though that's not actually required to solve the problem as it's quite small. However, that seems to imply that downstream computations are also batched, I can run
out = tp.push(source=8.0, target=8.5, data=gexp_src, scale_by_marginals=True, return_all=True, key_added=None)
now fine without any issues. However, this is a bit clumsy, as it requires me to solve the problem in a (slower) batch-wise fashion, even though I could solve it in offline mode. Thus, I think it would be nice to decouple the two batch_sizes
, to allow a problem to be solved using some batch size, and to use pull/push
downstream with another batch size.
sorry, partly unrelated - if I want to impute gene expression at the target using the source, would I have to use scale_by_marginals
? Intuitively, I would say no, as all I want is Y = P^T X
, where P
is the coupling, X
is known gene expression in the source, and Y
is my unknown gene expression in the target. So I just want this matrix multiplication, with no additional scaling.