Sander Dieleman

Results 136 comments of Sander Dieleman

> But what if you stack two of those things on top of each other (with some convolutions in between) and then end in a dense layer? It's wasteful to...

> If the TransformerLayer is not the first thing in your pipeline, then you don't necessarily know the input shape, and you don't necessarily have a particular output shape in...

regarding b), I guess it's better to support it since it doesn't really add any overhead, does it? Nor does it make the code any more difficult to understand (or...

I see. In that case maybe we shouldn't support it. Maybe let's see if anyone can come up with a plausible use case within the next two days or so...

This looks great! If anyone wants to throw it into a PR, that would be very welcome :)

That's a good point. However, people are much more likely to change the nonlinearity of the layer than the initialization strategy, so I'm not sure if that would be a...

I like wrt, it's consistent with Theano itself.

I've only glanced over the proposal so far, but it looks good to me. It complicates the code quite a bit, unfortunately, but I think the use cases for this...

> Maybe that's really something we shouldn't care too much about. If we decide not to worry about it, that would mean we are free to rename `Layer`, right? Or...

Right, makes sense. Bummer :)