Kyle Daruwalla comments

Results 404 comments of


                                            Kyle Daruwalla

Move MHAttention layer to Flux

Oh sorry, I posted my comment before the page refreshed with your's. I think all we need to provide is a parallelized version of `NNlib.batched_mul` for 4D inputs. It could...

Move MHAttention layer to Flux

I'm not sure what you mean. I'm pretty sure everyone in this thread wants to add the layer to Flux if that's what you're getting at.

Documentation revamp

I would prefer Pollen.jl because it accepts more formats including notebooks. For the last point, I would prefer to have a developer section directly in the docs (where contributing already...

Convolutions error using Measurements.jl

Yeah changing the weights won't help, because the undefined reference it is complaining about is an index to `y` in `conv_direct!` (i.e. the output). This is created by NNlib using...

Convolutions error using Measurements.jl

One fix would be to change L111 and L154 of `impl/conv_direct.jl` to ```julia # old y[w_idx, h_idx, d_idx, c_out, batch] = alpha*dotprod + beta*y[w_idx, h_idx, d_idx, c_out, batch] # new...

docs improve for ViT

@CarloLucibello want to rebase so we can merge?

Add gradients for `conv_bias_act`, and a similar `dense_bias_act`

Is the original message up top still accurate? It looks like the implementation is there. What help is necessary to get this through?

add Sparsemax activation

If you have an implementation, then a PR would be welcome! We can iterate the design there.

Issues with `ViT` on the GPU

For the reshape issue, have you looked at what is the output of `chunk`. I suspect it isn't a `CuArray` but a view instead. `permutedims` just turns it back into...

Issues with `ViT` on the GPU

Typically, CUDA.jl will return `CuArray`s instead of `SubArray`s for views, so long as the values are contiguous. The use of `selectdim` in `chunk` will try to take a view. If...