Fix and standardize in- and output shape of random variable methods
Output shapes of Normal.pdf are inconsistent (haven't checked other RVs), see this 2d example:
>>> n = Normal(np.zeros((2,)), np.eye(2))
>>> x = np.random.randn(2,)
>>> n.pdf(x)
0.05358174006777249
>>> x = np.random.randn(2,1)
>>> n.pdf(x)
array([0.04105293, 0.11377797])
>>> x = np.random.randn(1,2)
>>> n.pdf(x)
0.11294177981074723
>>> x = np.random.randn(3,2)
>>> n.pdf(x)
array([0.12783197, 0.02004086, 0.127179 ])
The 2nd one should raise an error (it does if the second axis of x > 1).
Talking about shapes, there's another problem with 1d RVs:
>>> n = Normal(np.zeros((1,)), np.eye(1))
>>> x = np.random.randn(10,1)
>>> n.pdf(x).shape
(10,)
>>> x = np.random.randn(1,1)
>>> n.pdf(x).shape
()
but:
>>> n = Normal(0,1)
>>> x = np.random.randn(10,1)
>>> n.pdf(x).shape
(10, 1)
>>> x = np.random.randn(1,1)
>>> n.pdf(x).shape
(1, 1)
These should be the same, no matter how n was defined.
I'm in favor of squeezing axis, but for x it might make sense sometimes to keep them around given they mean something (num_data, input_dim)
Oh damn, let's see if I understand the problem:
>>> n = Normal(np.zeros((2,)), np.eye(2)) >>> x = np.random.randn(2,) >>> n.pdf(x) 0.05358174006777249
This is intended behavior.
>>> x = np.random.randn(2,1) >>> n.pdf(x) array([0.04105293, 0.11377797])
I agree, this should raise an error.
>>> x = np.random.randn(1,2) >>> n.pdf(x) 0.11294177981074723
This should have shape (1,)
>>> x = np.random.randn(3,2) >>> n.pdf(x) array([0.12783197, 0.02004086, 0.127179 ])
This is intended behavior.
Talking about shapes, there's another problem with 1d RVs:
>>> n = Normal(np.zeros((1,)), np.eye(1)) >>> x = np.random.randn(10,1) >>> n.pdf(x).shape (10,)
This is intended behavior.
>>> x = np.random.randn(1,1) >>> n.pdf(x).shape ()
Here we should not squeeze, i.e. the output shape should be (1,).
but:
>>> n = Normal(0,1) >>> x = np.random.randn(10,1) >>> n.pdf(x).shape (10, 1) >>> x = np.random.randn(1,1) >>> n.pdf(x).shape (1, 1)These should be the same, no matter how
nwas defined.
These two are also intended behavior, since the shape of the random variable is actually () as opposed to (1,) in the example before. So I don't think they should be the same, since the shape of n should matter.
I'm in favor of squeezing axis, but for
xit might make sense sometimes to keep them around given they mean something(num_data, input_dim)
I'm not sure if I understood you correctly, but I think we should not squeeze axis. It makes sense to distinguish between random variable shapes () and (1,), since we want to make random variables "equivalent" to numpy arrays.
However I do agree that, to resolve this issue, we should require the inputs to n.pdf to have shape batch_shape + rv_shape and output should have shape batch_shape.
This would mean that
>>> n = Normal(0,1)
>>> x = np.random.randn(10,1)
>>> n.pdf(x).shape
(10, 1)
>>> x = np.random.randn(1,1)
>>> n.pdf(x).shape
(1, 1)
and
>>> n = Normal(np.zeros((1,)), np.eye(1))
>>> x = np.random.randn(10,1)
>>> n.pdf(x).shape
(10,)
>>> x = np.random.randn(1,1)
>>> n.pdf(x).shape
(1,)
What do you think @alpiges @JonathanWenger?
I agree with your assessment @marvinpfoertner. This behaviour is particularly nasty
>>> x = np.random.randn(2,1)
>>> n.pdf(x)
array([0.04105293, 0.11377797])
I guess there is some implicit NumPy broadcasting that's going on.
>>> x = np.random.randn(2,1) >>> n.pdf(x) array([0.04105293, 0.11377797])
True, this is a broadcasting bug and needs to be fixed. All the others are nitpicky (because basically inconsequential) shape inconsistencies (or maybe not even that).
Re
>>> n = Normal(0,1) >>> x = np.random.randn(10,1) >>> n.pdf(x).shape (10, 1) >>> x = np.random.randn(1,1) >>> n.pdf(x).shape (1, 1)
I get your point, but considered the definition of n as a lazy way to define a one-dimensional rv. That means I would expect the last axis to be squeezed due to, effectively, a production over dimension. But true, that's ambiguous, because otherwise there is no way to tell if the user wants a batch (N1, N2) or (N, D). On the other hand, I never see why 1d should be treated differently than multi-D and then you'd expect inputs like (N1, N2, D) --- if at all.
```python >>> x = np.random.randn(2,1) >>> n.pdf(x) array([0.04105293, 0.11377797])True, this is a broadcasting bug and needs to be fixed.
Giving this some more thought: Maybe we even want to allow broadcasting here, in case one actually intends to evaluate in a point where all coordinates are equal. This would mean that this is not a bug that needs fixing, but rather a feature that needs documentation and testing.
On the other hand, I never see why 1d should be treated differently than multi-D and then you'd expect inputs like (N1, N2, D) --- if at all.
One good reason for keeping the 1-dimensions around is that (N, 1) + (N', N, K) broadcasts differently than (N,) + (N', N, K) if K = N.
Broadcasting is exactly the reason why I think squeezing user input is not a good idea. IMHO, we should try to stick to input shape = output shape.
The intended behavior for any methods (.pdf, .cdf, etc.) is that they take arguments of shape (n_0, n_1, ...., d_0, d_1, ...) where n_i determines the batches and d_i the dimensions of the space of the random variable. However, note that scalar random variables have shape (). In particular we obtain:
# Scalar
rv # shape=()
x = np.arange(5) # shape=(5,)
y = rv.pdf(x) # shape=(5,)
"""Different case:""""
x = np.randn([10, 1]) # shape=(10,1)
z = rv.pdf(x) # shape=(10, 1)
# Vector
rv # shape=(2,)
x = np.randn([5, 2]) # shape=(5,2)
y = rv.pdf(x) # shape=(5,)
# Matrix-variate
rv # shape=(4, 3)
x = np.randn([5, 6, 4, 3])
y = rv.pdf(x) # shape= (5, 6)
To Do
- [ ] Fix broadcasting bug as described above
- [ ] Adjust docstring to include the information about input shape
- [ ] Write doctests for all cases above to illustrate the behaviour