Menshykov
Menshykov
So I've been trying to research the grouped theme, but just found out today that these guys already went deep into this thing with a lot of hardware. https://arxiv.org/pdf/1605.06489v1.pdf Proves...
Okay, now that ResNeXt is out https://arxiv.org/pdf/1611.05431.pdf , I'm hoping that I'm not the only one who understands importance of native grouped convolutions here? Since groups are exactly the only...
https://arxiv.org/pdf/1611.05431.pdf **Performance**. For simplicity we use Torch’s built-in grouped convolution implementation, without special optimization. We note that this implementation was brute-force and not parallelization-friendly. On 8 GPUs of NVIDIA M40,...
Actually, taking a closer look, Kaiming's paper doesn't have a lot of novelty vs https://arxiv.org/pdf/1605.06489v1.pdf which I've already linked to, basically it's a follow-up on that study, more of a...
NVidia said they're planning to release some implementation of groups in their next CuDNN.
https://developer.nvidia.com/cudnn so grouped convs are now available in CuDNN v7.
It's a good idea to test influence of LSUV init time batch sizes on large networks with highly variant data. It seems in the paper that you've only tested this...
Yes, it actually is not very shuffled. Which means that I have to use a larger batch here to get something more like I would get with a smaller one...
It would be great if you noted time it took to converge different stuff. Both epochs and actual time.
Yeah, I guess that would just take import time start = time.time() end = time.time() , logging spent time to a seperate file during saves and reading during loads. Not...