cossio

Results 152 comments of cossio

@dhairyagandhi96 I'd wait for https://github.com/FluxML/model-zoo/pull/150 to be merged.

> I believe the idea was that with decays and such, it might be possible to hit longer tuple lengths which can slow down compile time. How large can it...

Fixed by https://github.com/FluxML/Flux.jl/pull/1152

Reopening because https://github.com/FluxML/Flux.jl/pull/1874

> Think really hard about Transformers and ViT too. What are the particular issues with Transformers?

What's the API to use e.g. `pad_reflect` as defined there from a `Conv` layer?

@ludvigk are you planning to follow up on this? If not, I'd be willing to open a new PR with this.

Regarding the Simple ConvNet (LeNet5) example, I noticed it is using `relu` activations and max-pooling. I think the original LeCun paper doesn't do this. It is also different from https://d2l.ai/chapter_convolutional-neural-networks/lenet.html,...

Any updates on this? Related: https://github.com/tensorflow/tensorflow/issues/35617