scvi-tools
scvi-tools copied to clipboard
Implement scHPF
Proposal to implement scHPF.
- Code: https://github.com/simslab/scHPF
- Paper: https://www.embopress.org/doi/full/10.15252/msb.20188557
Level of difficulty: Intermediate/advanced. Probably best to use Pyro for this one.
I translated this model to pymc3 at some point. It was fairly straightforward because the model does not have that many variables (just 4 if I remember correctly). In the original implementation, they use various tricks to set priors based on data. If I remember correctly, they do full data training. The original implementation doesn't account for batch effects in any way so not straightforward to apply to many datasets. I would be interested to see how well it works in pyro with VI (compared to EM) both when training on full data, in mini-batches and using amortised inference. Cell2location repo defines a useful class (https://github.com/BayraktarLab/cell2location/blob/master/cell2location/distributions/AutoNormalEncoder.py) for doing amortisation with a convenient interface (https://github.com/BayraktarLab/cell2location/blob/master/cell2location/models/_cell2location_module.py#L205-L230). I can look into writing this model over the next 2 months if you are interested.
@vitkl the advantage of their implementation which we can't match is that they can handle data in sparse format because a lot of the actual computation can be amortized for 0 counts. So I don't expect we would be faster than their implementation. Even if we could, their closed form VI updates are likely to outperform BBVI in pyro.
But that said, we could offer amortized inference (encoder), and even extend it to handle batch effects. I intended the initial implementation of this to be an exercise for @watiss. I anticipate we will have something before end of January.
I see. I would be quite curious to see how it works. I wish there was more work explicitly comparing pyro/VI/amortized inference to closed-form updates - to help decide how much worse exactly amortized inference is in comparison and when it is worth investing in deriving closed-form VI updates (+sticking to simpler models where that can be done).