Sander Dieleman
Sander Dieleman
> But what if you stack two of those things on top of each other (with some convolutions in between) and then end in a dense layer? It's wasteful to...
> If the TransformerLayer is not the first thing in your pipeline, then you don't necessarily know the input shape, and you don't necessarily have a particular output shape in...
regarding b), I guess it's better to support it since it doesn't really add any overhead, does it? Nor does it make the code any more difficult to understand (or...
I see. In that case maybe we shouldn't support it. Maybe let's see if anyone can come up with a plausible use case within the next two days or so...
This looks great! If anyone wants to throw it into a PR, that would be very welcome :)
That's a good point. However, people are much more likely to change the nonlinearity of the layer than the initialization strategy, so I'm not sure if that would be a...
I like wrt, it's consistent with Theano itself.
I've only glanced over the proposal so far, but it looks good to me. It complicates the code quite a bit, unfortunately, but I think the use cases for this...
> Maybe that's really something we shouldn't care too much about. If we decide not to worry about it, that would mean we are free to rename `Layer`, right? Or...
Right, makes sense. Bummer :)