Suggest: batchsize dimension should always be not broadcastable even if batchsize==1
Following code in lasagne/layers/input.py:
input_var_type = T.TensorType(theano.config.floatX,
[s == 1 for s in self.shape])
would best to be changed to:
input_var_type = T.TensorType(theano.config.floatX,
[False,]+[s == 1 for s in self.shape[1:]])
Sorry for the delay. Why would you want this? There could be use cases for a broadcastable batch dimension.
@f0k Consider this case: We training a model using batchsize > 1, and inference using batchsize = 1. This will break the code using the output of a layer which has shape (batchsize,) as a vector. When inference it is not a vector because dim 0 is breadcastable, but when training it is vector.
This will break the code using the output of a layer which has shape (batchsize,) as a vector.
Which layer breaks when it sees a vector of broadcast pattern (True,) instead of (False,)? And if you train with batchsize > 1, why not leave the batchsize unspecified as None when compiling for inference?
@f0k I got your point, but leaving the batchsize unspecified as None looks like workaround tips, instead of designed to be.
Follow the documents of lasagne, If I want better performance and I know the batchsize, I should provide it. If we treat the documents as representation of the design, I think the impliment should support the direct thinking from reading the documents.
Maybe broadcastable batch dimension is useful in some case, but nobody know it if not look at the code. While, thinking batchsize==1 would act similar to batchsize>1 is reasonable if only look at the documents.
btw, The detail of my case, I passed classify target of shape (batchsize,) through a InputLayer, and call categorical_crossentropy on the output of the layer. So no lasagne code breaks.
Follow the documents of lasagne, If I want better performance and I know the batchsize, I should provide it.
Yes, I agree. We want it to work better or equally well when providing the batchsize, not worse.
The detail of my case, I passed classify target of shape (batchsize,) through a InputLayer, and call categorical_crossentropy on the output of the layer. So no lasagne code breaks.
I still don't understand what breaks -- it shouldn't make a difference whether it's broadcastable or not?
@f0k When calling lasagne.objectives.categorical_crossentropy, theano would raise a TypeError: integer vector required for argument: true_one_of_n(got type: TensorType(int64, (True,)) instead of: TensorType(int64, vector))
Traceback (most recent call last):
File "mnist.py", line 39, in <module>
main()
File "/home/ubuntu/deepstacks/deepstacks/framework/main.py", line 2197, in main
run(args)
File "/home/ubuntu/deepstacks/deepstacks/framework/main.py", line 1478, in run
ordered_errors = get_ordered_errors(raw_errors)
File "/home/ubuntu/deepstacks/deepstacks/lasagne/utils.py", line 17, in ordered_errors
res += [[prefix+t, map(curry(lasagne.layers.get_output,deterministic=deterministic), errors[t])]]
File "/home/ubuntu/deepstacks/deepstacks/utils/curry.py", line 14, in __call__
return self.fun(*(self.pending + args), **kw)
File "/home/ubuntu/Lasagne/lasagne/layers/helper.py", line 197, in get_output
all_outputs[layer] = layer.get_output_for(layer_inputs, **kwargs)
File "/home/ubuntu/Lasagne/lasagne/layers/merge.py", line 352, in get_output_for
output = self.merge_function(output, input)
File "/home/ubuntu/deepstacks/deepstacks/framework/macros.py", line 13, in <lambda>
'equal':[target,'classify',lambda x,y:r*lasagne.objectives.categorical_crossentropy(x,y),],
File "/home/ubuntu/Lasagne/lasagne/objectives.py", line 179, in categorical_crossentropy
return theano.tensor.nnet.categorical_crossentropy(predictions, targets)
File "/home/ubuntu/Theano/theano/tensor/nnet/nnet.py", line 2099, in categorical_crossentropy
return crossentropy_categorical_1hot(coding_dist, true_dist)
File "/home/ubuntu/Theano/theano/gof/op.py", line 615, in __call__
node = self.make_node(*inputs, **kwargs)
File "/home/ubuntu/Theano/theano/tensor/nnet/nnet.py", line 1452, in make_node
tensor.lvector))
TypeError: integer vector required for argument: true_one_of_n(got type: TensorType(int64, (True,)) instead of: TensorType(int64, vector))
following patch was applied to deepstacks/deepstacks/framework/main.py to produce similar behaviour of lasagne.layers.InputLayer.
--- main.py 2017-09-16 06:08:06.931262231 +0800
+++ main-likelasagne.py 2017-09-16 06:08:08.427262241 +0800
@@ -1453,7 +1453,7 @@
logging.info(string.strip(out.getvalue()))
name=k
input_var_type = T.TensorType(dtypes[k],
- [False,]+[s == 1 for s in m[k][1:]])
+ [s == 1 for s in m[k][:]])
var_name = ("%s.input" % name) if name is not None else "input"
input_var = input_var_type(var_name)
inputs[k]=lasagne.layers.InputLayer(name=name,input_var=input_var,shape=m[k])
When calling lasagne.objectives.categorical_crossentropy, theano would raise a TypeError
Hmm, this should either be changed in Theano's or Lasagne's categorical_crossentropy(). It should be able to handle a singleton batch dimension.
I agree