Abhirath Anand
Abhirath Anand
Alright now that #200 is in, this is the next PR we need to land for the docs transition to go through. A rebase should do the trick. I think...
Thanks a lot for this @Saransh-cpp! This is going to make my life a lot easier with the new docs structure 😄
The other thing is the use of NeuralAttentionlib, which last I checked slowed the model down a _lot_ on the CPU. Not sure if there's been any changes there -...
That sounds great! Metalhead can simply call `using MetalheadModels.jl` then. One question I have regarding this PR, though, is that the major rise in time for `using Metalhead` seems to...
NeuralAttentionlib already works with more than 3D inputs - one of the reasons I used it as a dep was that it would allow that functionality in the future (see...
> So you say that the attention layer in pytorch is hardly used in practice? https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html No, this one is used quite often...I meant the layer in the form as...
This is what torch and numpy have to say: > If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last...
One of the main questions I have regarding Pollen is if there is support for documenting multiple versions of the same package? This is something I suspect we will need...
Side note - using `basic_conv_bn` for the Inception models seems to have fixed their gradient times somehow, which are now _much_ more manageable. Maybe the extra `bias`es were causing a...
Intermittent failures for the ConvNeXt, ConvMixer testset on Julia 1 seem memory related (nightly tests pass. Very funny thing to suddenly start occurring). The pretrained ResNets will be broken until...