Kyle Daruwalla
Kyle Daruwalla
That's correct, thanks for catching the mistake! If you have time to file a PR, then that would be appreciated.
This is not true for the current LSTM implementation but could be done for `Recur`. Is this something people use or can we close as a non-priority?
If you have code, a PR is always better to discuss a possible change. This is not meant to be push back, but is using `permutedims` before the BN not...
Yeah if we did do this, I would prefer to do it across the board rather than have some layers with fixed `features x channel x samples` and some flexible....
That's true. `PermuteDimsArray` is a zero-copy version of `permutedims`.
`permutedims` does copy AFAIK. It isn't default no-copy like `reshape`. Sounds like a good proposal though.
You are not using Adapt.jl here, because `x` starts off on the GPU before being passed to `PermuteDimsArray`. You would be using Adapt.jl if you started with a `PermuteDimsArray{Array}` then...
Okay, that's because broadcasting with a `PermuteDimsArray` triggers `getindex` which uses scalar indexing.
Fully agree with updating the design to be non-mutating. There are two options we've discussed in the past: 1. `y, h = cell(x, h)` like here (I guess this PR...
> For Functors to use properties-not-fields everywhere, I think it would have to be functor(guide, target) rather than functor(::Type{Guide}, target). I haven't thought through what else this would change. In...