fuel icon indicating copy to clipboard operation
fuel copied to clipboard

False singleton dimension for scalar targets considered harmful

Open dwf opened this issue 10 years ago • 5 comments

Consider this code:

In [1]: from fuel.datasets import MNIST

In [2]: import numpy

In [3]: first_five_targets = MNIST('train').get_data(request=range(5))[-1]  # They happen to be 5, 0, 4, 1, 9

Let's consider the output of some hypothetical neural network:

In [4]: outputs = numpy.array([[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
   ...:                       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
   ...:                       [0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
   ...:                       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
   ...:                       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
   ...:                      ])

In [5]: predictions = outputs.argmax(axis=-1)

Now let's calculate our accuracy. Should be 4/5 = 0.8.

In [6]: (first_five_targets == predictions).mean()
Out[6]: 0.20000000000000001

This silently broadcasted, incorrectly, to a 5x5 binary matrix. To achieve the correct result we need

In [7]: (first_five_targets.squeeze() == predictions).mean()
Out[7]: 0.80000000000000004

This is a tricky bug to spot, and these sorts of bugs could be obviated by considering scalar targets scalar and batches of scalar targets 1-dimensional.

dwf avatar May 23 '15 20:05 dwf

@dwf is this issue still relevant?

vdumoulin avatar Oct 06 '15 14:10 vdumoulin

I think it is, since it also makes it trickier to adapt non-Fuel code to use Fuel, as I experienced when adapting the tutorials for the deep learning summer school.

lamblin avatar Oct 14 '15 17:10 lamblin

Indeed, it's kind of annoying to remember to flatten each time. I also don't totally buy the metadata argument that was the justification: that the dimension label should be "index". That is not, in fact, the label for the dimension but a label for a particular column on that dimension. What you're describing with that is more the semantics of elements of the "space" in which the items lie.

dwf avatar Oct 14 '15 19:10 dwf

I personally like the consistency that it is a 2D array, because it makes having a single label just a special case of the multiple label case i.e. batch[i] returns an array of labels for a sample. In this case that array has length 1 (i.e. batch is a column vector), but that doesn't always have to be the case, and batch could just as well be a 2D array or a nested list.

bartvm avatar Oct 14 '15 19:10 bartvm

If we're already giving up on the elements batch[i] being arrays (i.e. they could be lists), then I don't care if they're sequences or not. The tradeoff of using atleast_1d() in places where I expect an array is better than accidental silent broadcasting, as operations that explicitly require a sequence (or more specifically an array) and receive a scalar will usually raise an error, whereas operations that expect a vector and get a matrix will often not (as demonstrated by the example in the description text).

dwf avatar Oct 14 '15 20:10 dwf