Lasagne icon indicating copy to clipboard operation
Lasagne copied to clipboard

Suggest: batchsize dimension should always be not broadcastable even if batchsize==1

Open guoxuesong opened this issue 8 years ago • 8 comments

Following code in lasagne/layers/input.py:

        input_var_type = T.TensorType(theano.config.floatX,
                                      [s == 1 for s in self.shape])

would best to be changed to:

    input_var_type = T.TensorType(theano.config.floatX,
                                      [False,]+[s == 1 for s in self.shape[1:]])

guoxuesong avatar Aug 09 '17 05:08 guoxuesong

Sorry for the delay. Why would you want this? There could be use cases for a broadcastable batch dimension.

f0k avatar Sep 08 '17 18:09 f0k

@f0k Consider this case: We training a model using batchsize > 1, and inference using batchsize = 1. This will break the code using the output of a layer which has shape (batchsize,) as a vector. When inference it is not a vector because dim 0 is breadcastable, but when training it is vector.

guoxuesong avatar Sep 09 '17 07:09 guoxuesong

This will break the code using the output of a layer which has shape (batchsize,) as a vector.

Which layer breaks when it sees a vector of broadcast pattern (True,) instead of (False,)? And if you train with batchsize > 1, why not leave the batchsize unspecified as None when compiling for inference?

f0k avatar Sep 15 '17 14:09 f0k

@f0k I got your point, but leaving the batchsize unspecified as None looks like workaround tips, instead of designed to be.

Follow the documents of lasagne, If I want better performance and I know the batchsize, I should provide it. If we treat the documents as representation of the design, I think the impliment should support the direct thinking from reading the documents.

Maybe broadcastable batch dimension is useful in some case, but nobody know it if not look at the code. While, thinking batchsize==1 would act similar to batchsize>1 is reasonable if only look at the documents.

btw, The detail of my case, I passed classify target of shape (batchsize,) through a InputLayer, and call categorical_crossentropy on the output of the layer. So no lasagne code breaks.

guoxuesong avatar Sep 15 '17 15:09 guoxuesong

Follow the documents of lasagne, If I want better performance and I know the batchsize, I should provide it.

Yes, I agree. We want it to work better or equally well when providing the batchsize, not worse.

The detail of my case, I passed classify target of shape (batchsize,) through a InputLayer, and call categorical_crossentropy on the output of the layer. So no lasagne code breaks.

I still don't understand what breaks -- it shouldn't make a difference whether it's broadcastable or not?

f0k avatar Sep 15 '17 16:09 f0k

@f0k When calling lasagne.objectives.categorical_crossentropy, theano would raise a TypeError: integer vector required for argument: true_one_of_n(got type: TensorType(int64, (True,)) instead of: TensorType(int64, vector))

Traceback (most recent call last):
  File "mnist.py", line 39, in <module>
    main()
  File "/home/ubuntu/deepstacks/deepstacks/framework/main.py", line 2197, in main
    run(args)
  File "/home/ubuntu/deepstacks/deepstacks/framework/main.py", line 1478, in run
    ordered_errors = get_ordered_errors(raw_errors)
  File "/home/ubuntu/deepstacks/deepstacks/lasagne/utils.py", line 17, in ordered_errors
    res += [[prefix+t, map(curry(lasagne.layers.get_output,deterministic=deterministic), errors[t])]]
  File "/home/ubuntu/deepstacks/deepstacks/utils/curry.py", line 14, in __call__
    return self.fun(*(self.pending + args), **kw)
  File "/home/ubuntu/Lasagne/lasagne/layers/helper.py", line 197, in get_output
    all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
  File "/home/ubuntu/Lasagne/lasagne/layers/merge.py", line 352, in get_output_for
    output = self.merge_function(output, input)
  File "/home/ubuntu/deepstacks/deepstacks/framework/macros.py", line 13, in <lambda>
    'equal':[target,'classify',lambda x,y:r*lasagne.objectives.categorical_crossentropy(x,y),],
  File "/home/ubuntu/Lasagne/lasagne/objectives.py", line 179, in categorical_crossentropy
    return theano.tensor.nnet.categorical_crossentropy(predictions, targets)
  File "/home/ubuntu/Theano/theano/tensor/nnet/nnet.py", line 2099, in categorical_crossentropy
    return crossentropy_categorical_1hot(coding_dist, true_dist)
  File "/home/ubuntu/Theano/theano/gof/op.py", line 615, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/ubuntu/Theano/theano/tensor/nnet/nnet.py", line 1452, in make_node
    tensor.lvector))
TypeError: integer vector required for argument: true_one_of_n(got type: TensorType(int64, (True,)) instead of: TensorType(int64, vector))

following patch was applied to deepstacks/deepstacks/framework/main.py to produce similar behaviour of lasagne.layers.InputLayer.

--- main.py     2017-09-16 06:08:06.931262231 +0800
+++ main-likelasagne.py 2017-09-16 06:08:08.427262241 +0800

@@ -1453,7 +1453,7 @@
         logging.info(string.strip(out.getvalue()))
         name=k
         input_var_type = T.TensorType(dtypes[k],
-                [False,]+[s == 1 for s in m[k][1:]])
+                [s == 1 for s in m[k][:]])
         var_name = ("%s.input" % name) if name is not None else "input"
         input_var = input_var_type(var_name)
         inputs[k]=lasagne.layers.InputLayer(name=name,input_var=input_var,shape=m[k])

guoxuesong avatar Sep 15 '17 21:09 guoxuesong

When calling lasagne.objectives.categorical_crossentropy, theano would raise a TypeError

Hmm, this should either be changed in Theano's or Lasagne's categorical_crossentropy(). It should be able to handle a singleton batch dimension.

f0k avatar Dec 31 '17 12:12 f0k

I agree

guoxuesong avatar Jan 24 '18 12:01 guoxuesong