Przemyslaw Tredak
Przemyslaw Tredak
I don't think it would make big difference (and user is free to change it to lower value if they want) - the indexed recordio version of shuffle is much...
@piiswrong @zhreshold Do you have any other comments?
This is only partially true (and the issue should not be closed). Downsample is one of the convolutions that should have stride 2 (and it has, like you pointed out,...
Alternatively you can use the NGC container: https://ngc.nvidia.com/catalog/containers/nvidia:mxnet , version 20.10 supports `sm_86` (so RTX3000 series).
This is a legitimate failure - we are using C++17 which does not need message in static_assert, but the previous versions do (and 1.x uses older C++ standard) - you...
Could you try CUDA 11.something? There was a change in 11.2 I believe that should help here.
Hi @rationalism, Llama is actually supported by TE's LayerNormMLP module via the `swiglu` activation. For performance reasons we fuse the 2 Linear layers into a single one. I recommend looking...
Closing this issue since GLU activations are supported in TE and there was no activity here for over a month. Please feel free to reopen if you believe that we...
Hi @sirutBuasai, what is the cuDNN version you are using?
@cyanguwa I think we still should catch this error from cuDNN Frontend and just disable cuDNN's implementation of attention in this case.