We should add an is_sparse param to our distributions (at least NB, with LL computation here)
As an example in Pyro for Poisson, see here
If X is known to be sparse, we'd only have to compute the middle term for those 0 entries (r is inv. disp., m is mean, k is x).
