fast-LayerNorm-TF
fast-LayerNorm-TF copied to clipboard
Efficient layer normalization GPU kernel for Tensorflow
Hi, I'm interested in using this for a language model, which will need more than 5120 in the last dimension, how would I go about expanding this limit?
Hey Chiu, Unfortunately getting an error when I run both .so files: `tensorflow.python.framework.errors_impl.NotFoundError: layer_norm_fused_op.so: undefined symbol: _ZN10tensorflow8internal10LogMessage12MinVLogLevelEv ` I could try to dig in and modify the c++ code but...
fixes https://github.com/MycChiu/fast-LayerNorm-TF/issues/2