Kyle Daruwalla
Kyle Daruwalla
Oh sorry, I posted my comment before the page refreshed with your's. I think all we need to provide is a parallelized version of `NNlib.batched_mul` for 4D inputs. It could...
I'm not sure what you mean. I'm pretty sure everyone in this thread wants to add the layer to Flux if that's what you're getting at.
I would prefer Pollen.jl because it accepts more formats including notebooks. For the last point, I would prefer to have a developer section directly in the docs (where contributing already...
Yeah changing the weights won't help, because the undefined reference it is complaining about is an index to `y` in `conv_direct!` (i.e. the output). This is created by NNlib using...
One fix would be to change L111 and L154 of `impl/conv_direct.jl` to ```julia # old y[w_idx, h_idx, d_idx, c_out, batch] = alpha*dotprod + beta*y[w_idx, h_idx, d_idx, c_out, batch] # new...
@CarloLucibello want to rebase so we can merge?
Is the original message up top still accurate? It looks like the implementation is there. What help is necessary to get this through?
If you have an implementation, then a PR would be welcome! We can iterate the design there.
For the reshape issue, have you looked at what is the output of `chunk`. I suspect it isn't a `CuArray` but a view instead. `permutedims` just turns it back into...
Typically, CUDA.jl will return `CuArray`s instead of `SubArray`s for views, so long as the values are contiguous. The use of `selectdim` in `chunk` will try to take a view. If...