fast-LayerNorm-TF Errors on Using .so Files

Hey Chiu,

Unfortunately getting an error when I run both .so files:

tensorflow.python.framework.errors_impl.NotFoundError: layer_norm_fused_op.so: undefined symbol: _ZN10tensorflow8internal10LogMessage12MinVLogLevelEv

I could try to dig in and modify the c++ code but I'm getting problems when I make on my computer. Perhaps I can investigate this further. Just wanted to see if you run into the same error?

Jan 17 '17 01:01 NickShahML

Yes, I just ran the test again with the newly compiled .so file, and it worked fine. I am really new to c/c++, so I am not sure about this, but do .so files work even if it's compiled in a different environment? Maybe you could tell me the error when you do make on your computer, I might be able to help with it.

Jan 17 '17 03:01 MycChiu

Hi @MycChiu, having a similar error. No issues with make but then when I import I get this similar error:

NotFoundError: layer_norm_fused_op.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev

here is the output of make:

user@xxx:# make
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
nvcc -std=c++11 -c -o layer_norm_fused_op_gpu.cu.o layer_norm_fused_op_gpu.cu.cc \
-I/usr/local/lib/python2.7/dist-packages/tensorflow/include -D GOOGLE_CUDA=1 -x cu -Xcompiler -fPIC --expt-relaxed-constexpr -arch=sm_60
g++ -std=c++11 -shared -o layer_norm_fused_op.so register_ops.cc layer_norm_fused_op.h \
layer_norm_fused_grad_op.cc layer_norm_fused_op.cc layer_norm_fused_op_gpu.cu.o \
-I/usr/local/lib/python2.7/dist-packages/tensorflow/include -L /usr/local/cuda/lib64/ -fPIC -lcudart -O2 -DNDEBUG

Thanks for the help! (also, any news on the merge into TF? seems like tests have failed over there)

Feb 13 '17 19:02 cloofa

@cloofa Hi, sorry for the long delay, I didn't realize my chrome logged me out of github. Your make output seems alright, but I wonder which Nvidia card are you using? is it GTX10xx series? if it's not, please modify the -arch=sm_60 option in makefile according to your card's compute capability.

Feb 18 '17 14:02 MycChiu

Hey @MycChiu , Thanks for trying to help us both out. I've still been working on this in both the maxwell and pascal front and both result in the same error regardless of the -arch=sm being used.

Feb 18 '17 23:02 NickShahML

@NickShahML were you able to build with make without errors? or were you using the .so files provided with the repo? The only time I ran into this issue while successfully building .so file was when I built the .so file with different version of Tensorflow than the one I ran it with. Would it be possible that you have two directories of Tensorflow on your system, and the line TF_INC :=$(shell python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())') in makefile detects the tensorflow you don't usually use.

Feb 19 '17 02:02 MycChiu

@NickShahML @cloofa , I think I found the culprit! The latest commit should (hopefully) solve the undefined symbol error.

Feb 20 '17 13:02 MycChiu

@MycChiu Thanks again for all the effort in this! I'll give it a shot now and let you know if it works fine for me.

Feb 20 '17 19:02 cloofa

Hey @MycChiu sorry for the late reply. I tested your updates and the good news is that the error is removed! It is also incredibly fast (~10% more time compared to vanilla RNN) which is simply incredible. The memory footprint is also next to nothing.

However, when I actually run the layer norm on a standard LSTM or GRU, the loss goes to infinity. This is on a simple language task. When I use TF's layer norm, it converges as normal.

I also noticed that when I run your layer_norm_fused_test.py there are 20 failures out of 27. I'm running these on Titan X Maxwell cards. Maybe it only works on Pascal Cards? I'm also running on tensorflow 1.0.0.

Have you tried using them on RNN's? Unfortunately it causes exploding gradients when I use your layer norm.

Mar 01 '17 14:03 NickShahML

Hmmm...Yes, I have been using them on all my models, and they worked properly. After reading through the codes again, I think the culprit might be the atomicAdd function, since it uses a custom function for cards older than Pascal, so it fits your hypothesis as well. I will try to fix it now.

Mar 03 '17 03:03 MycChiu

I fixed it, but after looking at it again, the bug should only affect inputs with float64 dtype, but I forced float32 in the tests, so this bug may not be related to your problem. However, please do try again if you have the chance, and let me know if the problem still persists.

Mar 03 '17 04:03 MycChiu

Hey @MycChiu thanks for attempting the fix but unfortunately the model still blows up immediately. I'm just not sure what is causing this especially if it works for your own rnn's =(. At this point, I think having it passed in tensorflow's checks may reveal the error. I really appreciate your time on this as it would be very powerful to have this work!

Mar 04 '17 23:03 NickShahML

Yeah, failing on the tests are definitely unusual. It would be nice if you can let me know which 7 of the tests actually passed, and some of the feedback from the failed tests?

Mar 06 '17 02:03 MycChiu

Here is the entire readout I get. Sorry for such a lengthy read but I figure the more the merrier: Edit(MycChiu):I took the liberty to reformat the log, so it looks clearer.

I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([   0,    1,    2, ..., 2997, 2998, 2999]),)
not close lhs =  [ 1.  1.  1. ...,  0.  0.  0.]
not close rhs =  [-0.10250235  0.31227398  1.27090645 ...,  1.54781079  0.08999836
 -0.48522937]
not close dif =  [ 1.10250235  0.68772602  0.27090645 ...,  1.54781079  0.08999836
  0.48522937]
not close tol =  [ 0.00020189  0.0002403   0.00041585 ...,  0.00046655  0.0001996
  0.00027197]
dtype = float32, shape = (3000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([     0,      1,      2, ..., 639997, 639998, 639999]),)
not close lhs =  [ 0.  0.  0. ...,  0.  0.  0.]
not close rhs =  [-0.13142765  0.31940711  1.23660862 ..., -0.9077481  -0.12797761
  0.36428559]
not close dif =  [ 0.13142765  0.31940711  1.23660862 ...,  0.9077481   0.12797761
  0.36428559]
not close tol =  [ 0.00026986  0.00031469  0.00053345 ...,  0.00045501  0.00026903
  0.00032539]
dtype = float32, shape = (640000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([     0,      1,      2, ..., 159997, 159998, 159999]),)
not close lhs =  [ 0.44852987  0.94589144  0.18774445 ...,  0.41391799  0.29672188
  0.12436157]
not close rhs =  [-0.2498666   0.03294075  1.00781751 ...,  0.0356741  -0.93506682
  1.28256679]
not close dif =  [ 0.69839644  0.91295069  0.82007307 ...,  0.37824389  1.23178864
  1.15820527]
not close tol =  [ 0.00045259  0.00037404  0.00072706 ...,  0.00037503  0.00070071
  0.00082655]
dtype = float32, shape = (160000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([   0,    1,    2, ..., 2997, 2998, 2999]),)
not close lhs =  [ 0.81279415  0.84384191  0.2710382  ...,  0.18715858  0.4909662
  0.42726046]
not close rhs =  [-1.11246312  0.94082093  0.29847884 ...,  0.14951777  1.15302885
 -0.80257875]
not close dif =  [ 1.92525721  0.09697902  0.02744064 ...,  0.03764081  0.66206264
  1.22983921]
not close tol =  [ 0.00105623  0.00097041  0.00064924 ...,  0.00057476  0.00107651
  0.00090129]
dtype = float32, shape = (3000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([    0,     1,     2, ..., 39997, 39998, 39999]),)
not close lhs =  [ 0.20310658  0.73494613  0.61084998 ...,  0.68315393  0.01515152
  0.54107183]
not close rhs =  [-0.8456707   1.30370474  0.63130188 ..., -0.3551352   1.69852567
 -0.87971026]
not close dif =  [ 1.04877734  0.56875861  0.0204519  ...,  1.03828907  1.68337417
  1.42078209]
not close tol =  [ 0.00092284  0.00115185  0.00081565 ...,  0.00067757  0.00134926
  0.00093986]
dtype = float32, shape = (40000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
.I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
.I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([   0,    1,    2, ..., 2997, 2998, 2999]),)
not close lhs =  [ 0.42472538  0.37530497  0.99421793 ...,  0.68562156  0.66331702
  0.38412571]
not close rhs =  [-0.10250294  0.31227338  1.27090585 ...,  1.54781127  0.08999908
 -0.48522902]
not close dif =  [ 0.52722836  0.06303158  0.27668792 ...,  0.86218971  0.57331795
  0.86935472]
not close tol =  [ 0.00020189  0.0002403   0.00041585 ...,  0.00046655  0.0001996
  0.00027197]
dtype = float32, shape = (3000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([     0,      1,      2, ..., 639997, 639998, 639999]),)
not close lhs =  [ 0.  0.  0. ...,  0.  0.  0.]
not close rhs =  [-0.13142717  0.31940687  1.23660886 ..., -0.9077481  -0.12797761
  0.36428547]
not close dif =  [ 0.13142717  0.31940687  1.23660886 ...,  0.9077481   0.12797761
  0.36428547]
not close tol =  [ 0.00026986  0.00031469  0.00053345 ...,  0.00045501  0.00026903
  0.00032539]
dtype = float32, shape = (640000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([     0,      1,      2, ..., 159997, 159998, 159999]),)
not close lhs =  [ 0.65949059  0.75539988  0.01627109 ...,  0.79502207  0.55700713
  0.46866623]
not close rhs =  [-0.2498666   0.03294075  1.00781751 ...,  0.03567457 -0.93506718
  1.28256679]
not close dif =  [ 0.90935719  0.72245914  0.99154639 ...,  0.7593475   1.49207425
  0.81390059]
not close tol =  [ 0.00045259  0.00037404  0.00072706 ...,  0.00037503  0.00070071
  0.00082655]
dtype = float32, shape = (160000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([   0,    1,    2, ..., 2997, 2998, 2999]),)
not close lhs =  [ 0.64321852  0.99767846  0.87546843 ...,  0.7952162   0.37742803
  0.72094768]
not close rhs =  [-1.11246312  0.94082093  0.29847884 ...,  0.14951777  1.15302885
 -0.80257875]
not close dif =  [ 1.75568163  0.05685753  0.57698959 ...,  0.64569843  0.77560079
  1.52352643]
not close tol =  [ 0.00105623  0.00097041  0.00064924 ...,  0.00057476  0.00107651
  0.00090129]
dtype = float32, shape = (3000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([    0,     1,     2, ..., 39997, 39998, 39999]),)
not close lhs =  [ 0.20310658  0.73494613  0.61084998 ...,  0.68315393  0.01515152
  0.54107183]
not close rhs =  [-0.8456707   1.30370474  0.63130188 ..., -0.3551352   1.69852567
 -0.87971026]
not close dif =  [ 1.04877734  0.56875861  0.0204519  ...,  1.03828907  1.68337417
  1.42078209]
not close tol =  [ 0.00092284  0.00115185  0.00081565 ...,  0.00067757  0.00134926
  0.00093986]
dtype = float32, shape = (40000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
.I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
.I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181,
       182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194,
       195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
       208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220,
       221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233,
       234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246,
       247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259,
       260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272,
       273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285,
       286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298,
       299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311,
       312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324,
       325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337,
       338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350,
       351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363,
       364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376,
       377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389,
       390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402,
       403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415,
       416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428,
       429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441,
       442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454,
       455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467,
       468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480,
       481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493,
       494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506,
       507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519,
       520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532,
       533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545,
       546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558,
       559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571,
       572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584,
       585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597,
       598, 599]),)
not close lhs =  [ 0.50550514  0.45203111  0.9935928   0.49104482  0.50055796  0.50833219
  0.51327056  0.11103532  0.86709148  0.31329945  0.93447727  0.43122295
  0.94378144  0.92580038  0.18223813  0.31491703  0.04300005  0.584436
  0.03728725  0.91527098  0.55822361  0.6897707   0.6347869   0.27780828
  0.48334709  0.02499153  0.3150138   0.95672464  0.79180473  0.94923317
  0.32366401  0.37897909  0.33054543  0.03656276  0.39066625  0.42707768
  0.87326574  0.27402136  0.24499142  0.74002141  0.47794929  0.96124512
  0.76470685  0.36057717  0.97361523  0.24047625  0.76084077  0.20948157
  0.21138777  0.36420184  0.56928861  0.43578461  0.5908252   0.13232395
  0.33628577  0.41610044  0.75397164  0.68348831  0.41778186  0.89252174
  0.11282995  0.49876541  0.59665143  0.98079741  0.17811832  0.88895679
  0.69679999  0.37974873  0.32128486  0.36797372  0.01767158  0.93800497
  0.01388099  0.50373811  0.68268001  0.59903723  0.16237736  0.56453013
  0.33338344  0.99246639  0.35732922  0.00428055  0.80582166  0.41267049
  0.95511854  0.5880385   0.83806789  0.50486457  0.66589373  0.03760138
  0.86614817  0.61400121  0.3460184   0.26312563  0.29221562  0.83278418
  0.98354536  0.9669596   0.68807858  0.07782424  0.32245818  0.15561508
  0.97274005  0.5139817   0.64785206  0.13471262  0.78992468  0.76319838
  0.51139009  0.58029681  0.54229695  0.53309655  0.49998832  0.38698533
  0.44083118  0.9269011   0.39496827  0.53111833  0.8248722   0.25241479
  0.96281028  0.28781536  0.64950311  0.4949626   0.2468053   0.64421755
  0.14268608  0.31017694  0.97021526  0.94569701  0.73741865  0.88877875
  0.92809862  0.0780699   0.3401311   0.836564    0.39956585  0.14019315
  0.59209621  0.0615247   0.01039144  0.69671339  0.81884646  0.27251479
  0.46443111  0.49299097  0.8264879   0.0294289   0.55187768  0.50632232
  0.91997707  0.35775834  0.45087045  0.86897326  0.30629662  0.54001546
  0.09430943  0.6105051   0.39520031  0.84091324  0.6031903   0.34377256
  0.45655492  0.34711719  0.60307372  0.93436778  0.73433083  0.19160327
  0.66180813  0.87640953  0.9729445   0.70416176  0.23257004  0.7977978
  0.6152271   0.05440135  0.72616071  0.75930488  0.33000654  0.57293326
  0.05478565  0.02433516  0.90872836  0.11817513  0.85816604  0.29659644
  0.72573274  0.80120778  0.77142596  0.33097729  0.460969    0.97101945
  0.49809295  0.37806997  0.49081051  0.82124764  0.47356889  0.30039668
  0.08294202  0.53342128  0.12808308  0.50146377  0.6217553   0.29732147
  0.72022092  0.98927969  0.65321749  0.17109428  0.17958784  0.54061991
  0.5761767   0.85983133  0.03204704  0.99554992  0.85068452  0.11315721
  0.74906147  0.62251556  0.80670583  0.5193913   0.49134317  0.50267541
  0.42630187  0.60842979  0.96950275  0.04815686  0.39965713  0.12360556
  0.50015974  0.96082616  0.62410045  0.93132514  0.39713547  0.72761816
  0.64471006  0.0533253   0.20383687  0.93046385  0.9886536   0.49457636
  0.18594719  0.99147558  0.69506657  0.42136881  0.76755774  0.03372785
  0.75314623  0.72403473  0.88245064  0.98227704  0.2980929   0.85040116
  0.35887545  0.79509813  0.09801001  0.0697429   0.21324028  0.98239601
  0.05702004  0.39770344  0.02351199  0.41102469  0.09954241  0.02658976
  0.38769725  0.96032172  0.56206322  0.11307674  0.77392983  0.94659537
  0.6000343   0.4731158   0.76468664  0.55195302  0.28673869  0.5170393
  0.15481207  0.68458432  0.95028383  0.70255077  0.88408858  0.48704523
  0.68680447  0.24831165  0.03808794  0.53073859  0.55824506  0.27250588
  0.58294129  0.71831161  0.5992837   0.23849171  0.76891822  0.69679028
  0.04076225  0.26691887  0.15224634  0.62702996  0.97542781  0.59259444
  0.34631294  0.77166611  0.53887022  0.0469279   0.02838802  0.01667779
  0.45630825  0.77851403  0.09473498  0.83246231  0.73939276  0.46127218
  0.56958073  0.76512879  0.04334134  0.10701407  0.48516893  0.59187049
  0.46694723  0.89151657  0.22892274  0.45494524  0.70658642  0.03426983
  0.73910153  0.62051576  0.56143266  0.04127588  0.0790786   0.94707352
  0.82397145  0.69568485  0.79698652  0.92084187  0.17190348  0.23775916
  0.39035594  0.11219937  0.15823296  0.18187204  0.29850581  0.84545791
  0.05690642  0.43098772  0.36736962  0.07213821  0.90026426  0.10742454
  0.73556978  0.12059266  0.04945515  0.77182275  0.00553695  0.65555996
  0.46188763  0.50169683  0.35540196  0.68305731  0.91792053  0.02445895
  0.9979437   0.52091533  0.10295664  0.00120695  0.6230644   0.16840185
  0.81737858  0.76034653  0.10108109  0.9908843   0.9718439   0.08852883
  0.69786644  0.04902339  0.67470455  0.28035447  0.82505006  0.86635309
  0.02299462  0.77200115  0.98823887  0.3360664   0.59390837  0.64084488
  0.45982182  0.9053362   0.35953709  0.65852499  0.52518588  0.13342118
  0.37228179  0.51823616  0.79988021  0.07169946  0.31556228  0.10259119
  0.15166874  0.63220769  0.63113427  0.72671956  0.37655193  0.6336711
  0.19663519  0.48870209  0.76830578  0.50838017  0.28209814  0.42535135
  0.80462533  0.27402878  0.62699616  0.61207032  0.18935186  0.60564578
  0.61568838  0.82570779  0.36093149  0.09213867  0.42171538  0.67042041
  0.83321398  0.55981463  0.48169348  0.73259962  0.59451085  0.21546161
  0.83671653  0.28637248  0.51643229  0.70236051  0.84913075  0.87984848
  0.68382186  0.23310202  0.48609567  0.27017999  0.62612325  0.11075785
  0.74261087  0.71053588  0.07480577  0.56080073  0.39127138  0.72329086
  0.1704227   0.23546986  0.91446626  0.04920875  0.64610833  0.22868106
  0.3299042   0.89322168  0.36862153  0.06228219  0.01046041  0.90032661
  0.8584134   0.76430893  0.1536289   0.95172668  0.47003013  0.12127476
  0.65901196  0.56461698  0.65557474  0.63291889  0.5025593   0.01726783
  0.3933014   0.78028977  0.22569644  0.44325373  0.73584127  0.93295306
  0.64146155  0.37394726  0.53157765  0.10798398  0.49414718  0.151813
  0.38368362  0.60413051  0.95869941  0.64668632  0.93460834  0.48997203
  0.97672397  0.34430245  0.82305169  0.09965475  0.84940588  0.18586534
  0.90182489  0.03768349  0.33154044  0.93374687  0.38421524  0.19952112
  0.61996859  0.70234883  0.875341    0.25097999  0.90978283  0.78384513
  0.80990875  0.4955892   0.73133928  0.33891344  0.23890038  0.75716656
  0.7247299   0.05805623  0.01557244  0.78323734  0.33891344  0.2267198
  0.12963866  0.49534398  0.63266587  0.14934     0.74314713  0.03975917
  0.16683508  0.31929529  0.84895706  0.51520431  0.82414526  0.66430444
  0.7870636   0.44354111  0.60674214  0.91205603  0.89608091  0.42798275
  0.01164096  0.47231299  0.68144101  0.89501303  0.36140844  0.44516829
  0.37476009  0.01812538  0.65266484  0.8928529   0.42231801  0.889256
  0.04131282  0.83437514  0.74762696  0.89155704  0.95588768  0.66921443
  0.41529912  0.17342596  0.79086667  0.21849142  0.53883469  0.55524814
  0.49155954  0.44428316  0.4468635   0.39480329  0.60576755  0.08164784
  0.42011255  0.13072212  0.96957904  0.90595245  0.34068274  0.68663085
  0.88615525  0.01308193  0.73065877  0.10049504  0.13814861  0.26981866
  0.71763182  0.46537879  0.58671409  0.64929545  0.25229919  0.29926181
  0.82055265  0.8331778   0.1867394   0.90348923  0.20383509  0.21632543
  0.50127256  0.32759541  0.46659073  0.77193618  0.99924225  0.21417156
  0.60392398  0.08809964  0.01877572  0.35124114  0.48911589  0.02533697]
not close rhs =  [ -1.02502942e-01   3.12273383e-01   1.27090585e+00   2.44885802e-01
  -1.19179237e+00   9.96674299e-02   7.36900449e-01  -1.45905256e-01
   1.48670673e-01  -6.54196858e-01   4.26197886e-01  -5.95450997e-01
   8.68541121e-01   1.51895845e+00  -1.43768787e-02   3.25196147e-01
   2.29993105e-01  -6.53335690e-01   1.37927210e+00   5.34903884e-01
  -7.39464164e-01  -1.36670518e+00   3.28720927e-01  -5.97338319e-01
   1.48220527e+00  -4.85030651e-01   1.49245858e-01   1.43484437e+00
  -6.84927583e-01   1.54992068e+00   4.07705665e-01  -5.93829989e-01
   1.33673155e+00  -7.39172220e-01   7.78493047e-01  -6.66388154e-01
   6.41508937e-01   6.72374368e-01  -4.33041930e-01  -1.06397748e-01
  -8.25397134e-01  -1.21405077e+00  -9.04872775e-01  -1.45611513e+00
  -9.23494816e-01  -4.40571904e-01  -1.48052299e+00  -1.95629835e-01
   9.51048136e-02  -1.36451244e-01  -2.92792916e-01   1.54278636e-01
   7.35601783e-01  -8.05277526e-01  -1.67622447e-01   7.03478217e-01
   2.63475537e-01   1.54556596e+00  -1.22796726e+00  -1.35966563e+00
  -1.19961417e+00   1.05774295e+00  -3.06006670e-02   1.25346911e+00
   9.49606299e-01   4.80681062e-01   1.26472771e+00   1.54952514e+00
   7.26588130e-01  -6.73524022e-01  -8.46063733e-01  -1.45154095e+00
   1.03563011e+00  -3.28270078e-01   3.31366658e-01   8.90650868e-01
  -7.02755094e-01   5.93046069e-01   8.12024236e-01  -2.34045863e-01
   1.20869768e+00   1.87112451e-01  -6.88980579e-01   5.40712714e-01
   5.19232154e-01   1.49441803e+00   1.22349179e+00  -1.71142411e+00
  -1.07235909e+00  -2.23424911e-01  -4.40654993e-01  -5.71821690e-01
   3.06443810e-01   1.66793764e+00   7.68065453e-02   7.01408029e-01
   9.27028298e-01  -4.51027751e-01  -5.65310359e-01   9.47290063e-01
  -5.77736735e-01  -2.64088988e-01  -4.68677163e-01   1.65808570e+00
   1.34654057e+00   5.42809844e-01  -7.29241729e-01   1.49274576e+00
  -1.06884551e+00   2.94290781e-02   1.42328775e+00  -6.50482774e-01
   1.08572721e-01  -5.95306516e-01  -7.50654936e-01  -1.09850383e+00
   2.33004808e-01  -1.57819378e+00  -1.45146811e+00  -8.50892425e-01
  -7.93421984e-01  -5.90575814e-01  -1.44149423e+00  -9.16604698e-01
   2.52244353e-01  -9.28530693e-02  -5.09593248e-01   1.09597814e+00
   1.64315462e-01   1.70968878e+00  -3.43553424e-01  -5.22386193e-01
  -1.64656734e+00  -1.65018117e+00  -1.37330961e+00   1.42047524e-01
  -8.68814290e-01  -1.32242322e+00  -7.22380280e-01  -1.33821750e+00
   1.58617973e-01  -7.16222882e-01   2.64479518e-01  -7.72085071e-01
  -6.94340467e-02  -7.72083879e-01  -1.74885178e+00  -7.45154381e-01
  -7.39354730e-01   2.45240092e-01  -7.98396230e-01  -3.68301511e-01
   1.74912095e-01   1.43878329e+00  -1.34705627e+00   3.79787087e-01
  -4.95802402e-01   1.43701708e+00  -5.69840074e-01   1.04229629e+00
   6.86540961e-01  -1.29114866e-01   7.87200928e-02   5.70152998e-02
  -2.31119394e-01   3.81537318e-01  -1.30258608e+00  -2.37519860e-01
   1.49876130e+00   1.44905448e-01   1.58414829e+00  -3.44236732e-01
   7.02823758e-01   7.82753587e-01   1.05323195e-01   9.32096839e-01
  -6.47236109e-02   2.86189318e-02  -1.66363072e+00   5.11537790e-02
  -1.42407727e+00  -1.35476708e+00   4.50624824e-01   6.74736142e-01
   2.36572027e-01   7.79958844e-01  -1.56743085e+00   9.01900530e-02
  -1.30919170e+00  -4.13309216e-01  -5.25984645e-01   7.15470672e-01
   6.01756930e-01  -8.18324447e-01   1.37914193e+00   2.48732924e-01
   9.22679901e-03  -1.58078206e+00   1.04356611e+00  -1.47306919e-03
   1.38852370e+00   6.14985824e-01   1.46017790e-01  -4.70507026e-01
  -4.32441473e-01  -3.02485824e-01   7.32287288e-01   8.19667459e-01
  -4.80575085e-01   2.84194350e-01  -3.81520867e-01   4.61577058e-01
  -3.27077746e-01   7.53576636e-01  -1.67926323e+00  -1.57141817e+00
   1.53741837e-01  -1.40068066e+00   6.53946519e-01  -2.53978848e-01
   4.02667165e-01  -7.56250739e-01   1.14059007e+00  -1.59906030e-01
  -1.70565200e+00   1.48718417e+00   9.14576650e-01   5.34920692e-02
   6.71740651e-01   1.55428517e+00   4.31843638e-01   4.49212790e-02
   8.11948657e-01   1.23762429e+00  -1.37320280e+00  -6.81748152e-01
  -4.20697331e-01   1.70881402e+00   5.34024358e-01   8.65598917e-02
   2.07386494e-01   1.16850245e+00  -2.21383333e-01   2.06177115e-01
   7.72284389e-01  -1.18877351e+00  -4.63907599e-01  -4.20727611e-01
  -4.56802249e-01   8.37202668e-01   6.40213132e-01   1.99996591e-01
  -1.10095775e+00   4.58405018e-02   8.29735160e-01   1.29136431e+00
  -8.42309773e-01  -1.18019247e+00  -1.03471375e+00   1.48729432e+00
  -2.83389211e-01  -7.53418207e-01   1.16575301e+00   3.67535710e-01
   4.03172851e-01   1.39258623e-01  -1.51269484e+00   7.12190032e-01
   1.65086973e+00  -6.25021577e-01   6.64839149e-01   1.49707401e+00
  -1.18464863e+00   8.03896189e-02  -4.71089721e-01   2.01163411e-01
   7.81647563e-01   8.47054362e-01   4.13716674e-01   9.01193023e-01
  -3.30082536e-01   1.52094877e+00   1.94983006e-01   1.22339487e-01
   1.27397668e+00  -1.16118705e+00   1.58422720e+00  -3.97146702e-01
  -4.41629887e-02  -1.10997367e+00   9.29932475e-01   2.92824149e-01
   4.00000930e-01  -1.36971164e+00  -7.88571656e-01   1.31924522e+00
  -1.16013956e+00  -6.69495940e-01  -5.20578742e-01  -1.55728960e+00
   1.68770206e+00   9.51665521e-01   1.11272871e+00   7.37068295e-01
  -4.46444869e-01   1.28169286e+00  -4.30074573e-01   1.19771838e-01
  -7.62328029e-01  -1.44852614e+00   4.61619973e-01   5.09336352e-01
  -5.48927188e-01   4.93438244e-02   1.63709176e+00   6.30375504e-01
   7.13890910e-01  -1.12687516e+00  -6.87744737e-01   4.24862504e-01
  -1.62633491e+00  -2.78590083e-01   2.05733538e-01  -4.01054621e-02
   1.48576057e+00   4.07074809e-01  -1.10090542e+00  -4.33032155e-01
  -4.55622554e-01   5.99431634e-01   2.39618421e-01   8.06033969e-01
   1.71165586e-01   9.65857506e-03  -9.55707014e-01   7.27297902e-01
   5.15790343e-01  -8.09976995e-01  -3.45618844e-01   1.69536507e+00
  -1.98805451e-01   2.54860640e-01   1.74550164e+00  -1.19236267e+00
   5.00312448e-01   1.34865439e+00   7.81972051e-01   4.51180935e-02
   1.26632464e+00  -1.93602204e-01   6.04261518e-01   3.18237543e-01
   1.64117801e+00   1.74958622e+00   5.82452893e-01  -1.31673253e+00
   9.55615044e-02  -4.51371193e-01   1.51393414e-02   3.59658718e-01
  -4.83439803e-01   3.15951705e-01   6.87323689e-01  -1.30725169e+00
  -6.34731412e-01   1.15629470e+00   9.10927892e-01   1.69922507e+00
  -1.08720708e+00  -2.03101397e-01   6.82266831e-01  -2.60336399e-01
  -6.01570010e-01   5.67326665e-01   5.57692170e-01  -9.01723325e-01
  -4.93220687e-01  -4.46340442e-01   3.93434405e-01   1.14830101e+00
  -8.44668031e-01   6.20541930e-01   1.01377368e-01   8.50796700e-03
  -1.54874492e+00  -3.26367259e-01   7.12787271e-01  -5.57404876e-01
   7.79366136e-01  -5.77474833e-02  -6.75333142e-01   2.18567848e-01
   1.40794337e+00  -1.35089111e+00   5.07154107e-01  -4.89751935e-01
   6.75825000e-01  -1.53805649e+00  -1.44492817e+00  -2.73212552e-01
   5.12452483e-01  -1.13705564e+00   5.38272023e-01   3.68318677e-01
  -1.39154291e+00  -3.45228791e-01  -7.27305353e-01  -1.19465899e+00
  -2.08520055e-01   2.55020618e-01  -8.89964640e-01   5.57025075e-01
   5.84264874e-01   5.73248029e-01   5.73439598e-02   6.56514049e-01
   6.37265801e-01   8.30943942e-01  -1.21622825e+00  -1.26203942e+00
   8.37260842e-01   4.05325294e-01  -4.21078324e-01  -1.31304526e+00
  -3.17432880e-02   2.49033928e-01   3.30752730e-01   4.50862527e-01
  -3.04816604e-01  -6.64344668e-01   1.65853250e+00   9.75654244e-01
  -1.31032634e+00   2.87657380e-01  -3.66420507e-01  -6.87113523e-01
  -1.21569800e+00   1.58266485e+00  -2.97870040e-01  -6.99404478e-02
  -7.06323266e-01   2.13993430e-01  -1.21886301e+00  -1.21023202e+00
  -1.10316253e+00  -6.97915554e-02   2.91275024e-01  -1.40930200e+00
   1.37678623e-01  -4.71376777e-01  -8.63485873e-01  -1.48400736e+00
   2.38133669e-01  -5.35513163e-01  -5.33056974e-01  -3.94587159e-01
   3.71295810e-01  -9.29128945e-01   6.89392686e-01   5.61024547e-01
   1.33747220e-01   5.53454161e-02  -5.04129410e-01   1.59373164e-01
  -9.85565007e-01   4.02111173e-01   1.43759966e-01   5.42504907e-01
  -1.29650378e+00  -1.30912817e+00  -1.96693540e-01  -1.38875163e+00
  -1.58446991e+00  -9.04795229e-01   2.85163999e-01  -2.79323220e-01
   1.16387117e+00   8.59508395e-01  -2.58169532e-01   6.37761354e-02
   1.54057395e+00   1.54850900e+00   5.79495072e-01   7.32360959e-01
   1.26435053e+00  -7.96488702e-01   5.24789453e-01  -5.72549224e-01
  -3.75512004e-01   1.25341427e+00  -5.56103230e-01   5.94314218e-01
   5.39600968e-01   6.99255586e-01  -1.20892894e+00  -7.25844622e-01
   1.17820752e+00   1.34862900e-01   4.38071847e-01   1.23152864e+00
  -5.33676744e-01   1.23094761e+00   1.02659833e+00   3.87929082e-01
   2.18690395e-01  -1.89547658e-01   6.55021787e-01  -7.72616565e-01
   6.22571111e-01   9.70947862e-01  -2.04001546e-01  -1.52108812e+00
   3.47231865e-01   1.15311396e+00  -6.81926310e-01   1.44162059e-01
   8.89325500e-01  -1.61038244e+00  -1.38027143e+00  -1.52304602e+00
   3.90957475e-01  -6.42337918e-01  -1.20491230e+00   3.31482530e-01
  -1.10332108e+00  -2.88373351e-01  -6.53716147e-01  -1.61569393e+00
  -6.84168458e-01  -6.64584041e-01  -7.93846071e-01  -1.32267118e-01
  -1.29411995e+00  -2.12912560e-02   1.50621140e+00  -2.07621694e-01
   1.74216628e-01   8.93418193e-01  -3.71313095e-02  -6.43286347e-01
   1.37653577e+00   6.46056771e-01  -1.45263267e+00  -1.28564477e-01
  -1.26161003e+00   5.50510049e-01  -1.31447363e+00  -1.14992142e-01
  -1.12711918e+00  -1.51822400e+00   1.45697892e+00   1.29858196e+00
   1.37093794e+00   3.09769273e-01   2.83231378e-01  -8.68650615e-01
   7.90380120e-01  -6.17900014e-01  -2.37308383e-01  -1.51650763e+00
  -6.96287036e-01  -4.72229123e-01   2.51731992e-01   1.72989213e+00
   6.21606946e-01   5.42833209e-01   1.79154003e+00  -4.06040430e-01
   1.10005200e+00  -3.74480963e-01  -1.82091832e-01  -5.77739835e-01
   1.29538739e+00   1.44025862e+00   4.91118073e-01  -6.41899109e-02
  -4.17029858e-03   7.51554132e-01   1.26651371e+00   1.68478096e+00
  -6.22704864e-01   8.81105542e-01  -1.09680223e+00   1.81338787e-01
  -5.02624512e-02  -1.33947968e-01  -6.17818832e-02  -3.71057868e-01
  -1.47757709e+00   1.57581007e+00  -1.20537758e+00   9.18889165e-01
  -1.60766399e+00   8.37713599e-01  -5.01109600e-01   8.36344838e-01
   3.78030419e-01  -1.17682719e+00  -2.32136846e-01  -9.83714581e-01]
not close dif =  [  6.08008087e-01   1.39757723e-01   2.77313054e-01   2.46159017e-01
   1.69235039e+00   4.08664763e-01   2.23629892e-01   2.56940573e-01
   7.18420804e-01   9.67496276e-01   5.08279383e-01   1.02667391e+00
   7.52403140e-02   5.93158066e-01   1.96615010e-01   1.02791190e-02
   1.86993062e-01   1.23777175e+00   1.34198487e+00   3.80367100e-01
   1.29768777e+00   2.05647588e+00   3.06065977e-01   8.75146627e-01
   9.98858213e-01   5.10022163e-01   1.65767938e-01   4.78119731e-01
   1.47673225e+00   6.00687504e-01   8.40416551e-02   9.72809076e-01
   1.00618613e+00   7.75734961e-01   3.87826800e-01   1.09346581e+00
   2.31756806e-01   3.98353010e-01   6.78033352e-01   8.46419156e-01
   1.30334640e+00   2.17529583e+00   1.66957963e+00   1.81669235e+00
   1.89710999e+00   6.81048155e-01   2.24136376e+00   4.05111402e-01
   1.16282955e-01   5.00653088e-01   8.62081528e-01   2.81505972e-01
   1.44776583e-01   9.37601447e-01   5.03908217e-01   2.87377775e-01
   4.90496099e-01   8.62077653e-01   1.64574909e+00   2.25218725e+00
   1.31244409e+00   5.58977544e-01   6.27252102e-01   2.72671700e-01
   7.71487951e-01   4.08275723e-01   5.67927718e-01   1.16977644e+00
   4.05303270e-01   1.04149771e+00   8.63735318e-01   2.38954592e+00
   1.02174914e+00   8.32008183e-01   3.51313353e-01   2.91613638e-01
   8.65132451e-01   2.85159349e-02   4.78640795e-01   1.22651219e+00
   8.51368427e-01   1.82831898e-01   1.49480224e+00   1.28042221e-01
   4.35886383e-01   9.06379521e-01   3.85423899e-01   2.21628857e+00
   1.73825288e+00   2.61026293e-01   1.30680323e+00   1.18582296e+00
   3.95745933e-02   1.40481198e+00   2.15409070e-01   1.31376147e-01
   5.65170646e-02   1.41798735e+00   1.25338888e+00   8.69465828e-01
   9.00194883e-01   4.19704080e-01   1.44141722e+00   1.14410400e+00
   6.98688507e-01   4.08097208e-01   1.51916647e+00   7.29547381e-01
   1.58023560e+00   5.50867736e-01   8.80990803e-01   1.18357933e+00
   3.91415596e-01   9.82291818e-01   1.19148612e+00   2.02540493e+00
   1.61963463e-01   2.10931206e+00   2.27634025e+00   1.10330725e+00
   1.75623226e+00   8.78391147e-01   2.09099722e+00   1.41156733e+00
   5.43905795e-03   7.37070620e-01   6.52279317e-01   7.85801172e-01
   8.05899799e-01   7.63991773e-01   1.08097208e+00   1.41116500e+00
   2.57466602e+00   1.72825110e+00   1.71344066e+00   6.94516480e-01
   1.26838017e+00   1.46261632e+00   1.31447649e+00   1.39974225e+00
   1.48226529e-01   1.41293621e+00   5.54366946e-01   1.04459989e+00
   5.33865154e-01   1.26507485e+00   2.57533979e+00   7.74583280e-01
   1.29123235e+00   2.61082232e-01   1.71837330e+00   7.26059854e-01
   2.75958359e-01   5.69810033e-01   1.65335286e+00   1.60228372e-01
   5.90111852e-01   8.26511979e-01   9.65040386e-01   2.01383054e-01
   8.33506584e-02   4.72887427e-01   3.77834827e-01   2.90101886e-01
   8.34193110e-01   5.52830458e-01   2.03691697e+00   4.29123133e-01
   8.36953163e-01   7.31504083e-01   6.11203790e-01   1.04839849e+00
   4.70253706e-01   1.50442123e-02   5.09903908e-01   8.77695501e-01
   7.90884316e-01   7.30685949e-01   1.99363732e+00   5.21779478e-01
   1.47886288e+00   1.37910223e+00   4.58103538e-01   5.56560993e-01
   6.21594012e-01   4.83362406e-01   2.29316354e+00   7.11017728e-01
   2.08061767e+00   7.44286537e-01   9.86953616e-01   2.55548775e-01
   1.03663981e-01   1.19639444e+00   8.88331413e-01   5.72514713e-01
   4.64342088e-01   1.88117874e+00   9.60624099e-01   5.34894347e-01
   1.26044059e+00   1.13522053e-01   4.75737512e-01   7.67828465e-01
   1.15266240e+00   1.29176545e+00   7.90697932e-02   6.48573160e-01
   6.60162926e-01   2.56425560e-01   9.57697570e-01   3.98254275e-01
   3.59124780e-01   2.41973281e-01   2.52994776e+00   1.68457532e+00
   5.95319629e-01   2.02319622e+00   1.52759314e-01   7.73370147e-01
   8.86760056e-02   1.25892615e+00   7.14288235e-01   7.68335819e-01
   2.67515469e+00   1.43902731e+00   5.14919519e-01   7.01134875e-02
   1.71580911e-01   5.93459010e-01   1.92256808e-01   8.86403859e-01
   4.14813191e-01   5.10006130e-01   2.01791286e+00   7.35073447e-01
   6.24534190e-01   7.78350174e-01   4.54629242e-01   4.08016473e-01
   2.14392990e-02   1.77026868e-01   9.16449904e-01   2.15191692e-01
   4.72664833e-03   1.22250140e+00   1.21705389e+00   1.14476228e+00
   1.33925295e+00   1.45074368e-01   3.42120230e-01   6.50404572e-01
   1.45983315e+00   7.49257624e-01   7.31725156e-01   1.22162139e+00
   1.05555010e+00   2.16258860e+00   1.09173381e+00   1.08959091e+00
   3.06901187e-01   1.16444290e+00   1.06621063e+00   3.40945959e-01
   1.54756010e-02   8.21063101e-01   2.07475805e+00   5.99113286e-01
   8.76939893e-01   1.57161689e+00   6.48048520e-02   1.02395821e+00
   1.94933534e+00   4.71563399e-01   7.57828414e-01   3.15875888e-01
   6.26835465e-01   1.62470043e-01   5.36567152e-01   1.98642254e-01
   1.21417117e+00   1.03390360e+00   4.91821468e-01   1.25972167e-01
   1.23588872e+00   1.69192564e+00   1.02598214e+00   6.69652581e-01
   6.27104282e-01   1.82828522e+00   3.30648780e-01   5.43324351e-02
   3.68917286e-01   2.06650186e+00   8.29333901e-01   1.05232632e+00
   1.31238592e+00   1.29652596e+00   1.49600649e+00   2.14988399e+00
   1.34138918e+00   1.79999411e-01   5.73858500e-01   6.90140426e-01
   4.74832892e-01   1.26501513e+00   8.86382818e-01   6.58742189e-01
   8.57062995e-01   2.28098845e+00   2.77772784e-01   4.80641723e-02
   1.11850786e+00   7.15784967e-01   1.59375036e+00   5.23361444e-01
   2.28721976e-01   1.71874571e+00   1.15469193e+00   4.66654062e-01
   1.85525763e+00   7.33535290e-01   5.00852883e-01   7.43752867e-02
   7.46659040e-01   2.13440955e-01   1.66233802e+00   4.74308044e-01
   5.34701169e-01   3.47641885e-01   5.84353030e-01   1.10349119e-01
   6.25820935e-01   9.11183298e-01   1.12761045e+00   4.89538729e-01
   1.25434399e-01   9.22176361e-01   5.03851771e-01   1.51349306e+00
   4.97311264e-01   5.90597272e-01   1.68859518e+00   1.62335038e+00
   1.32942826e-01   1.27651620e+00   1.18292212e-01   6.23064488e-02
   5.30754864e-01   3.14194858e-01   5.54806352e-01   4.53585207e-01
   1.63564110e+00   1.09402633e+00   1.20565265e-01   1.81842935e+00
   2.59840459e-01   1.13442850e+00   9.02781188e-01   3.35199773e-01
   1.48138356e+00   2.04963624e-01   5.84367037e-01   1.30845869e+00
   1.25779581e+00   9.87892866e-01   9.35493112e-02   9.38878536e-01
   1.18828821e+00   1.19398570e+00   2.89577067e-01   3.48865211e-01
   1.29943645e+00   5.18303275e-01   1.17012382e-01   1.18207777e+00
   1.31827068e+00   1.31269360e+00   3.70439798e-01   3.76299858e-01
   1.83290696e+00   2.84475535e-01   4.92531002e-01   6.32336915e-01
   2.00856686e+00   1.23170352e+00   3.53250176e-01   1.21592987e+00
   2.54180253e-01   1.91168666e-01   1.04761493e+00   2.99668312e-01
   6.08063161e-01   1.42259061e+00   1.91591829e-01   5.92343152e-01
   5.24156272e-01   2.17026424e+00   2.07606244e+00   9.99932110e-01
   1.35900557e-01   1.77072668e+00   3.41636837e-01   1.20383412e-01
   2.15984869e+00   8.53608966e-01   1.00940347e+00   1.62001038e+00
   1.01314545e+00   1.90081596e-02   1.51696086e+00   5.50452471e-02
   3.94913018e-01   3.23977470e-02   5.58344424e-01   1.69193745e-01
   2.76334316e-01   7.38805294e-01   1.63794363e+00   1.93245983e+00
   4.04685736e-03   1.54489338e-01   9.02771831e-01   2.04564476e+00
   6.26254141e-01   3.35723162e-02   5.05963802e-01   1.64490044e-01
   8.21248889e-01   1.36670518e+00   8.09401751e-01   9.58057642e-02
   1.99414825e+00   5.45553565e-02   8.52516174e-01   9.57293510e-01
   1.84182119e+00   1.47190702e+00   1.04048085e+00   7.80476332e-01
   7.81129062e-01   3.46807301e-01   1.61013436e+00   1.93352294e+00
   1.27358520e+00   3.05261433e-01   6.23191237e-01   1.45851076e+00
   5.08429706e-01   7.00057864e-01   1.19339013e+00   2.37722898e+00
   1.30487859e-01   5.97795367e-01   5.43517411e-01   1.29491377e+00
   4.87117589e-01   1.69343781e+00   5.35763800e-01   3.90702128e-01
   3.36282909e-01   6.59293458e-02   1.16314137e+00   4.05243814e-01
   1.64113975e+00   2.30807722e-01   3.58799338e-01   5.25237083e-01
   1.68980515e+00   2.08941793e+00   4.22389984e-01   1.83200538e+00
   2.32031107e+00   1.83774829e+00   3.56297553e-01   6.53270483e-01
   6.32293522e-01   7.51524389e-01   7.52316713e-01   8.80368650e-02
   1.15689039e+00   9.44378495e-01   3.79204333e-01   8.56746435e-02
   3.29742193e-01   1.28646076e+00   4.51934516e-01   9.16851640e-01
   1.19856369e+00   1.15375948e+00   1.40550911e+00   4.08448875e-01
   3.62223923e-01   6.61572099e-01   1.54046941e+00   1.65959144e+00
   7.93992281e-01   6.46582246e-02   1.81896746e-01   5.29179811e-01
   1.40901780e+00   9.79967594e-01   1.16815507e-01   3.95916045e-01
   5.91218352e-01   6.85136855e-01   7.63174891e-02   1.11153007e+00
   3.83670747e-01   2.13781297e-01   9.28731441e-01   1.57914436e+00
   3.31659436e-01   3.69876623e-01   1.02083969e+00   8.25577378e-02
   7.59686828e-01   2.10572648e+00   2.01293731e+00   1.67238605e+00
   3.52189660e-01   6.82097077e-01   1.37174737e+00   1.21872425e-02
   1.95227814e+00   8.03577662e-01   1.47786140e+00   2.27999830e+00
   1.47123206e+00   1.10812521e+00   1.40058827e+00   1.04432321e+00
   2.19020081e+00   4.49274004e-01   1.49457049e+00   6.79934680e-01
   5.07224381e-01   1.59484148e-03   3.98539752e-01   1.08845460e+00
   1.00177574e+00   6.27931416e-01   2.10529757e+00   1.02141738e+00
   1.68392801e+00   3.38745952e-01   1.35578644e+00   9.49367285e-01
   1.87474608e+00   2.40978098e+00   5.01091242e-01   6.29367530e-01
   9.55638826e-01   1.36343315e-01   5.07635295e-01   1.08714199e+00
   2.51545429e-01   1.17314816e+00   7.28867888e-01   1.96079075e+00
   1.14315057e+00   8.67032409e-01   3.54035556e-01   1.64824426e+00
   2.01494396e-01   4.12111104e-01   8.21960986e-01   1.31199288e+00
   7.59369254e-01   1.06111181e+00   1.06824708e+00   5.90821743e-01
   5.64728618e-01   1.33976364e+00   3.52969468e-01   3.34008574e-01
   7.21802115e-01   2.86175340e-01   6.79799616e-01   1.03548551e+00
   8.75004053e-01   5.81843734e-01   1.91735482e+00   6.51839018e-01
   2.37001851e-01   1.03743720e+00   2.65616953e-01   5.87383270e-01
   1.97884965e+00   1.24821472e+00   1.67196834e+00   1.46952987e-01
   2.60690618e+00   6.23542070e-01   1.10503364e+00   7.48245180e-01
   3.59254688e-01   1.52806830e+00   7.21252739e-01   1.00905156e+00]
not close tol =  [ 0.00019752  0.0002351   0.00040684  0.00022303  0.00039267  0.00019701
  0.00031117  0.00020529  0.00020579  0.00029636  0.00025551  0.00028583
  0.00033476  0.00045128  0.00018173  0.00023741  0.00022036  0.0002962
  0.00042626  0.00027498  0.00031163  0.000424    0.00023805  0.00028617
  0.0004447   0.00026605  0.00020589  0.00043621  0.00030186  0.00045683
  0.0002522   0.00028554  0.00041863  0.00031158  0.00031862  0.00029854
  0.00029408  0.00029961  0.00025674  0.00019822  0.00032703  0.00039666
  0.00034127  0.00044002  0.0003446   0.00025808  0.0004444   0.0002142
  0.00019619  0.0002036   0.00023161  0.00020679  0.00031094  0.00032342
  0.00020918  0.00030518  0.00022636  0.00045605  0.00039915  0.00042274
  0.00039407  0.00036865  0.00018464  0.00040372  0.00034928  0.00026527
  0.00040574  0.00045676  0.00030933  0.00029982  0.00033073  0.0004392
  0.00036469  0.00023796  0.00023852  0.00033872  0.00030506  0.0002854
  0.00032463  0.00022108  0.0003957   0.00021268  0.00030259  0.00027602
  0.00027218  0.00044688  0.00039835  0.00048576  0.00037127  0.00021918
  0.0002581   0.0002816   0.00023405  0.00047797  0.00019291  0.00030481
  0.00034523  0.00025996  0.00028043  0.00034886  0.00028266  0.00022647
  0.00026312  0.00047621  0.00042039  0.0002764   0.0003098   0.00044659
  0.00037064  0.00018443  0.00043414  0.00029569  0.00019861  0.00028581
  0.00031364  0.00037596  0.0002209   0.00046189  0.00043919  0.00033159
  0.0003213   0.00028496  0.0004374   0.00034337  0.00022434  0.00019579
  0.00027045  0.0003755   0.00020859  0.00048545  0.0002407   0.00027274
  0.00047414  0.00047479  0.00042519  0.0002046   0.00033481  0.00041607
  0.00030857  0.0004189   0.00020757  0.00030747  0.00022654  0.00031748
  0.00019159  0.00031748  0.00049247  0.00031265  0.00031161  0.00022309
  0.00032219  0.00024514  0.00021049  0.00043692  0.00042048  0.00024719
  0.00026798  0.0004366   0.00028124  0.00036589  0.00030215  0.00020229
  0.00019326  0.00018937  0.00022056  0.00024751  0.00041252  0.00022171
  0.00044766  0.00020511  0.00046296  0.00024083  0.00030507  0.00031939
  0.00019802  0.00034614  0.00019075  0.00018428  0.0004772   0.00018832
  0.00043428  0.00042187  0.00025989  0.00030004  0.00022154  0.00031889
  0.00045997  0.00019531  0.0004137   0.0002532   0.00027339  0.00030733
  0.00028696  0.00032576  0.00042623  0.00022372  0.00018081  0.00046236
  0.00036611  0.00017942  0.00042791  0.00028933  0.00020531  0.00026345
  0.00025663  0.00023335  0.00031035  0.000326    0.00026525  0.00023007
  0.0002475   0.00026185  0.00023775  0.00031416  0.00048     0.00046068
  0.0002067   0.00043009  0.00029631  0.00022466  0.00025129  0.00031464
  0.0003835   0.0002078   0.00048473  0.00044559  0.000343    0.00018874
  0.0002995   0.00045761  0.00025652  0.0001872   0.00032462  0.00040088
  0.00042517  0.00030129  0.00025452  0.00048529  0.00027483  0.00019466
  0.00021631  0.0003885   0.00021882  0.00021609  0.00031751  0.00039213
  0.00026226  0.00025453  0.00026099  0.00032914  0.00029385  0.00021498
  0.00037639  0.00018737  0.0003278   0.00041051  0.00033006  0.00039059
  0.00036453  0.00044561  0.00022992  0.00031413  0.000388    0.000245
  0.00025138  0.0002041   0.00045016  0.00030675  0.00047491  0.00029113
  0.00029826  0.00044736  0.00039139  0.00019356  0.00026355  0.00021519
  0.00031919  0.00033091  0.00025327  0.00034061  0.00023829  0.00045164
  0.00021409  0.00020107  0.00040739  0.00038719  0.00046297  0.0002503
  0.00018707  0.00037801  0.00034576  0.00023161  0.00025082  0.00042454
  0.00032043  0.0004155   0.000387    0.0002991   0.00027242  0.00045815
  0.00048151  0.00034965  0.0003785   0.0003112   0.00025914  0.00040877
  0.0002562   0.00020061  0.00031573  0.00043866  0.00026186  0.0002704
  0.0002775   0.00018799  0.00047245  0.00029209  0.00030705  0.00038104
  0.00030237  0.00025527  0.00047052  0.00022906  0.00021601  0.00018634
  0.00044533  0.00025208  0.00037639  0.00025673  0.00026078  0.00028654
  0.00022208  0.00032356  0.00020982  0.00018088  0.00035037  0.00030945
  0.00027156  0.00032426  0.00024107  0.00048289  0.00021477  0.00022481
  0.00049187  0.00039277  0.00026879  0.00042077  0.00031925  0.00018724
  0.00040602  0.00021384  0.00028741  0.00023617  0.00047318  0.0004926
  0.0002835   0.00041505  0.00019627  0.00026002  0.00018187  0.00024359
  0.00026576  0.00023576  0.00030229  0.00041335  0.00029287  0.00038631
  0.00034235  0.00048358  0.00037393  0.00021554  0.00030138  0.00022579
  0.00028693  0.00028079  0.00027907  0.0003407   0.00026752  0.00025912
  0.00024964  0.00038488  0.00033048  0.00029033  0.00019732  0.00018068
  0.00045662  0.00023762  0.00030685  0.00027902  0.00031878  0.0001895
  0.00030014  0.00021831  0.00043139  0.00042117  0.00027001  0.0002669
  0.00030023  0.0004547   0.00043802  0.0002281   0.00027096  0.00038286
  0.00027559  0.00024514  0.00042845  0.000241    0.00030945  0.00039318
  0.00021651  0.00022484  0.00033859  0.00027895  0.00028383  0.00028185
  0.00018943  0.00029677  0.00029332  0.00032802  0.00039705  0.00040525
  0.00032915  0.00025177  0.00025459  0.00041439  0.00018484  0.00022377
  0.00023841  0.00025993  0.00023376  0.00029817  0.00047629  0.00035395
  0.0004139   0.00023069  0.0002448   0.00030225  0.00039695  0.00046269
  0.00023252  0.00019168  0.00030569  0.00021749  0.00039752  0.00039597
  0.00037679  0.00019166  0.00023134  0.00043164  0.00020382  0.0002636
  0.00033385  0.00044502  0.00022182  0.00027509  0.00027465  0.00024985
  0.00024567  0.00034561  0.00030266  0.00027966  0.00020312  0.00018907
  0.00026947  0.00020771  0.00035572  0.00025119  0.00020491  0.00027635
  0.00041143  0.00041369  0.00021439  0.00042795  0.00046302  0.00034125
  0.00023024  0.0002292   0.00038767  0.00033314  0.00022541  0.00019058
  0.00045515  0.00045658  0.00028297  0.00031036  0.00040567  0.00032185
  0.00027317  0.00028173  0.00024643  0.00040371  0.00027878  0.00028563
  0.00027583  0.00030443  0.00039574  0.00030919  0.00039023  0.00020332
  0.00025764  0.00039979  0.00027476  0.00039968  0.00036307  0.00024865
  0.00021833  0.00021311  0.0002965   0.00031757  0.00029069  0.0003531
  0.0002157   0.00045166  0.00024136  0.00038574  0.00030132  0.00020498
  0.00033848  0.00046766  0.00042644  0.00045201  0.0002492   0.00029423
  0.00039502  0.00023854  0.00037682  0.00023082  0.00029627  0.00046861
  0.00030173  0.00029822  0.00032137  0.00020285  0.000411    0.00018297
  0.000449    0.00021635  0.00021037  0.00033921  0.00018581  0.0002944
  0.00042577  0.0002949   0.0004394   0.00020219  0.00040518  0.00027778
  0.00041465  0.00019976  0.00038108  0.00045115  0.00044018  0.0004118
  0.00042476  0.00023465  0.0002299   0.00033478  0.00032075  0.00028985
  0.00022167  0.00045084  0.0003039   0.00026376  0.00022425  0.00048907
  0.00029052  0.0002764   0.00050012  0.0002519   0.00037623  0.00024624
  0.00021178  0.00028266  0.00041123  0.00043718  0.00026714  0.00019065
  0.0001799   0.0003138   0.00040606  0.00048099  0.00029071  0.00033701
  0.00037565  0.00021164  0.00018816  0.00020315  0.00019022  0.00024563
  0.00044387  0.00046147  0.0003951   0.00034378  0.00046717  0.00032923
  0.00026893  0.00032899  0.00024688  0.00038999  0.00022074  0.00035539]
dtype = float32, shape = (600,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([     0,      1,      2, ..., 639997, 639998, 639999]),)
not close lhs =  [ 0.  0.  0. ...,  0.  0.  0.]
not close rhs =  [-0.13142717  0.31940687  1.23660886 ..., -0.9077481  -0.12797761
  0.36428547]
not close dif =  [ 0.13142717  0.31940687  1.23660886 ...,  0.9077481   0.12797761
  0.36428547]
not close tol =  [ 0.00026986  0.00031469  0.00053345 ...,  0.00045501  0.00026903
  0.00032539]
dtype = float32, shape = (640000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([     0,      1,      2, ..., 159997, 159998, 159999]),)
not close lhs =  [ 0.44852987  0.94589144  0.18774445 ...,  0.41391799  0.29672188
  0.12436157]
not close rhs =  [-0.2498666   0.03294075  1.00781751 ...,  0.03567457 -0.93506718
  1.28256679]
not close dif =  [ 0.69839644  0.91295069  0.82007307 ...,  0.37824342  1.23178911
  1.15820527]
not close tol =  [ 0.00045259  0.00037404  0.00072706 ...,  0.00037503  0.00070071
  0.00082655]
dtype = float32, shape = (160000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([   0,    1,    2, ..., 2997, 2998, 2999]),)
not close lhs =  [ 0.5940764   0.06128557  0.57121229 ...,  0.49466735  0.62386781
  0.07677684]
not close rhs =  [-1.11246312  0.94082093  0.29847884 ...,  0.14951777  1.15302885
 -0.80257875]
not close dif =  [ 1.70653951  0.87953538  0.27273345 ...,  0.34514958  0.52916104
  0.87935561]
not close tol =  [ 0.00105623  0.00097041  0.00064924 ...,  0.00057476  0.00107651
  0.00090129]
dtype = float32, shape = (3000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([    0,     1,     2, ..., 39997, 39998, 39999]),)
not close lhs =  [ 0.14575288  0.62253463  0.58288687 ...,  0.78448278  0.81433421
  0.71792567]
not close rhs =  [-0.8456707   1.30370474  0.63130188 ..., -0.3551352   1.69852567
 -0.87971026]
not close dif =  [ 0.99142361  0.68117011  0.04841501 ...,  1.13961792  0.88419145
  1.59763598]
not close tol =  [ 0.00092284  0.00115185  0.00081565 ...,  0.00067757  0.00134926
  0.00093986]
dtype = float32, shape = (40000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
.I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
.I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([   0,    1,    2, ..., 2997, 2998, 2999]),)
not close lhs =  [ 0.77207696  0.24185866  0.70046699 ...,  0.19091007  0.72731107
  0.73249215]
not close rhs =  [-0.10250235  0.31227398  1.27090645 ...,  1.54781079  0.08999836
 -0.48522937]
not close dif =  [ 0.87457931  0.07041532  0.57043946 ...,  1.35690069  0.63731271
  1.21772146]
not close tol =  [ 0.00020189  0.0002403   0.00041585 ...,  0.00046655  0.0001996
  0.00027197]
dtype = float32, shape = (3000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([     0,      1,      2, ..., 639997, 639998, 639999]),)
not close lhs =  [ 0.  0.  0. ...,  0.  0.  0.]
not close rhs =  [-0.13142765  0.31940711  1.23660862 ..., -0.9077481  -0.12797761
  0.36428559]
not close dif =  [ 0.13142765  0.31940711  1.23660862 ...,  0.9077481   0.12797761
  0.36428559]
not close tol =  [ 0.00026986  0.00031469  0.00053345 ...,  0.00045501  0.00026903
  0.00032539]
dtype = float32, shape = (640000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([     0,      1,      2, ..., 159997, 159998, 159999]),)
not close lhs =  [ 0.44852987  0.94589144  0.18774445 ...,  0.41391799  0.29672188
  0.12436157]
not close rhs =  [-0.2498666   0.03294075  1.00781751 ...,  0.0356741  -0.93506682
  1.28256679]
not close dif =  [ 0.69839644  0.91295069  0.82007307 ...,  0.37824389  1.23178864
  1.15820527]
not close tol =  [ 0.00045259  0.00037404  0.00072706 ...,  0.00037503  0.00070071
  0.00082655]
dtype = float32, shape = (160000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([   0,    1,    2, ..., 2997, 2998, 2999]),)
not close lhs =  [ 0.64610833  0.22868106  0.3299042  ...,  0.00983995  0.90429521
  0.97767526]
not close rhs =  [-1.11246312  0.94082093  0.29847884 ...,  0.14951777  1.15302885
 -0.80257875]
not close dif =  [ 1.75857139  0.71213984  0.03142536 ...,  0.13967782  0.24873364
  1.78025401]
not close tol =  [ 0.00105623  0.00097041  0.00064924 ...,  0.00057476  0.00107651
  0.00090129]
dtype = float32, shape = (3000,)
FI tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
not close where =  (array([    0,     1,     2, ..., 39997, 39998, 39999]),)
not close lhs =  [ 0.20310658  0.73494613  0.61084998 ...,  0.68315393  0.01515152
  0.54107183]
not close rhs =  [-0.8456707   1.30370474  0.63130188 ..., -0.3551352   1.69852567
 -0.87971026]
not close dif =  [ 1.04877734  0.56875861  0.0204519  ...,  1.03828907  1.68337417
  1.42078209]
not close tol =  [ 0.00092284  0.00115185  0.00081565 ...,  0.00067757  0.00134926
  0.00093986]
dtype = float32, shape = (40000,)
F.
======================================================================
FAIL: testCenterGradient2DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 189, in testCenterGradient2DInput
    self.doGradientTest((10, 300), center=True, scale=False)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.00018312, atol=0.00018312

(mismatch 100.0%)
 x: array([ 1.,  1.,  1., ...,  0.,  0.,  0.], dtype=float32)
 y: array([-0.102502,  0.312274,  1.270906, ...,  1.547811,  0.089998,
       -0.485229], dtype=float32)

======================================================================
FAIL: testCenterGradient4DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 192, in testCenterGradient4DInput
    self.doGradientTest((100, 10, 10, 64), center=True, scale=False)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000238509, atol=0.000238509

(mismatch 99.9815625%)
 x: array([ 0.,  0.,  0., ...,  0.,  0.,  0.], dtype=float32)
 y: array([-0.131428,  0.319407,  1.236609, ..., -0.907748, -0.127978,
        0.364286], dtype=float32)

======================================================================
FAIL: testCenterGradient4DSmallDepth (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 195, in testCenterGradient4DSmallDepth
    self.doGradientTest((100, 10, 10, 16), center=True, scale=False)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000362113, atol=0.000362113

(mismatch 99.95875%)
 x: array([ 0.44853 ,  0.945891,  0.187744, ...,  0.413918,  0.296722,
        0.124362], dtype=float32)
 y: array([-0.249867,  0.032941,  1.007818, ...,  0.035674, -0.935067,
        1.282567], dtype=float32)

======================================================================
FAIL: testCenterOutput2DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 125, in testCenterOutput2DInput
    self.doOutputTest((10, 300), center=True, scale=False)
  File "layer_norm_fused_test.py", line 110, in doOutputTest
    outputs.ravel(), golds.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.0005, atol=0.0005

(mismatch 99.9%)
 x: array([ 0.812794,  0.843842,  0.271038, ...,  0.187159,  0.490966,  0.42726 ], dtype=float32)
 y: array([-1.112463,  0.940821,  0.298479, ...,  0.149518,  1.153029,
       -0.802579], dtype=float32)

======================================================================
FAIL: testCenterOutput4DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 128, in testCenterOutput4DInput
    self.doOutputTest((100, 10, 10, 4), center=True, scale=False)
  File "layer_norm_fused_test.py", line 110, in doOutputTest
    outputs.ravel(), golds.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.0005, atol=0.0005

(mismatch 99.955%)
 x: array([ 0.203107,  0.734946,  0.61085 , ...,  0.683154,  0.015152,
        0.541072], dtype=float32)
 y: array([-0.845671,  1.303705,  0.631302, ..., -0.355135,  1.698526, -0.87971 ], dtype=float32)

======================================================================
FAIL: testFusedGradient2DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 198, in testFusedGradient2DInput
    self.doGradientTest((10, 300), center=True, scale=True)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.00018312, atol=0.00018312

(mismatch 100.0%)
 x: array([ 0.424725,  0.375305,  0.994218, ...,  0.685622,  0.663317,
        0.384126], dtype=float32)
 y: array([-0.102503,  0.312273,  1.270906, ...,  1.547811,  0.089999,
       -0.485229], dtype=float32)

======================================================================
FAIL: testFusedGradient4DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 201, in testFusedGradient4DInput
    self.doGradientTest((100, 10, 10, 64), center=True, scale=True)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000238509, atol=0.000238509

(mismatch 99.9815625%)
 x: array([ 0.,  0.,  0., ...,  0.,  0.,  0.], dtype=float32)
 y: array([-0.131427,  0.319407,  1.236609, ..., -0.907748, -0.127978,
        0.364285], dtype=float32)

======================================================================
FAIL: testFusedGradientSmallDepth (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 204, in testFusedGradientSmallDepth
    self.doGradientTest((100, 10, 10, 16), center=True, scale=True)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000362113, atol=0.000362113

(mismatch 99.96125%)
 x: array([ 0.659491,  0.7554  ,  0.016271, ...,  0.795022,  0.557007,
        0.468666], dtype=float32)
 y: array([-0.249867,  0.032941,  1.007818, ...,  0.035675, -0.935067,
        1.282567], dtype=float32)

======================================================================
FAIL: testFusedOutput2DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 131, in testFusedOutput2DInput
    self.doOutputTest((10, 300), center=True, scale=True)
  File "layer_norm_fused_test.py", line 110, in doOutputTest
    outputs.ravel(), golds.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.0005, atol=0.0005

(mismatch 99.9333333333%)
 x: array([ 0.643219,  0.997678,  0.875468, ...,  0.795216,  0.377428,
        0.720948], dtype=float32)
 y: array([-1.112463,  0.940821,  0.298479, ...,  0.149518,  1.153029,
       -0.802579], dtype=float32)

======================================================================
FAIL: testFusedOutput4DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 134, in testFusedOutput4DInput
    self.doOutputTest((100, 10, 10, 4), center=True, scale=True)
  File "layer_norm_fused_test.py", line 110, in doOutputTest
    outputs.ravel(), golds.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.0005, atol=0.0005

(mismatch 99.955%)
 x: array([ 0.203107,  0.734946,  0.61085 , ...,  0.683154,  0.015152,
        0.541072], dtype=float32)
 y: array([-0.845671,  1.303705,  0.631302, ..., -0.355135,  1.698526, -0.87971 ], dtype=float32)

======================================================================
FAIL: testScaleGradient2DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 180, in testScaleGradient2DInput
    self.doGradientTest((2, 300), center=False, scale=True)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000179154, atol=0.000179154

(mismatch 100.0%)
 x: array([ 0.505505,  0.452031,  0.993593,  0.491045,  0.500558,  0.508332,
        0.513271,  0.111035,  0.867091,  0.313299,  0.934477,  0.431223,
        0.943781,  0.9258  ,  0.182238,  0.314917,  0.043   ,  0.584436,...
 y: array([ -1.025029e-01,   3.122734e-01,   1.270906e+00,   2.448858e-01,
        -1.191792e+00,   9.966743e-02,   7.369004e-01,  -1.459053e-01,
         1.486707e-01,  -6.541969e-01,   4.261979e-01,  -5.954510e-01,...

======================================================================
FAIL: testScaleGradient4DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 183, in testScaleGradient4DInput
    self.doGradientTest((100, 10, 10, 64), center=False, scale=True)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000238509, atol=0.000238509

(mismatch 99.9815625%)
 x: array([ 0.,  0.,  0., ...,  0.,  0.,  0.], dtype=float32)
 y: array([-0.131427,  0.319407,  1.236609, ..., -0.907748, -0.127978,
        0.364285], dtype=float32)

======================================================================
FAIL: testScaleGradientSmallDepth (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 186, in testScaleGradientSmallDepth
    self.doGradientTest((100, 10, 10, 16), center=False, scale=True)
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000362113, atol=0.000362113

(mismatch 99.95875%)
 x: array([ 0.44853 ,  0.945891,  0.187744, ...,  0.413918,  0.296722,
        0.124362], dtype=float32)
 y: array([-0.249867,  0.032941,  1.007818, ...,  0.035675, -0.935067,
        1.282567], dtype=float32)

======================================================================
FAIL: testScaleOutput2DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 119, in testScaleOutput2DInput
    self.doOutputTest((10, 300), center=False, scale=True)
  File "layer_norm_fused_test.py", line 110, in doOutputTest
    outputs.ravel(), golds.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.0005, atol=0.0005

(mismatch 99.9666666667%)
 x: array([ 0.594076,  0.061286,  0.571212, ...,  0.494667,  0.623868,
        0.076777], dtype=float32)
 y: array([-1.112463,  0.940821,  0.298479, ...,  0.149518,  1.153029,
       -0.802579], dtype=float32)

======================================================================
FAIL: testScaleOutput4DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 122, in testScaleOutput4DInput
    self.doOutputTest((100, 10, 10, 4), center=False, scale=True)
  File "layer_norm_fused_test.py", line 110, in doOutputTest
    outputs.ravel(), golds.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.0005, atol=0.0005

(mismatch 99.9675%)
 x: array([ 0.145753,  0.622535,  0.582887, ...,  0.784483,  0.814334,
        0.717926], dtype=float32)
 y: array([-0.845671,  1.303705,  0.631302, ..., -0.355135,  1.698526, -0.87971 ], dtype=float32)

======================================================================
FAIL: testVanillaGradient2DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 171, in testVanillaGradient2DInput
    self.doGradientTest((10, 300))
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.00018312, atol=0.00018312

(mismatch 100.0%)
 x: array([ 0.772077,  0.241859,  0.700467, ...,  0.19091 ,  0.727311,
        0.732492], dtype=float32)
 y: array([-0.102502,  0.312274,  1.270906, ...,  1.547811,  0.089998,
       -0.485229], dtype=float32)

======================================================================
FAIL: testVanillaGradient4DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 174, in testVanillaGradient4DInput
    self.doGradientTest((100, 10, 10, 64))
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000238509, atol=0.000238509

(mismatch 99.9815625%)
 x: array([ 0.,  0.,  0., ...,  0.,  0.,  0.], dtype=float32)
 y: array([-0.131428,  0.319407,  1.236609, ..., -0.907748, -0.127978,
        0.364286], dtype=float32)

======================================================================
FAIL: testVanillaGradientSmallDepth (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 177, in testVanillaGradientSmallDepth
    self.doGradientTest((100, 10, 10, 16))
  File "layer_norm_fused_test.py", line 168, in doGradientTest
    _c.ravel(), _g.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.000362113, atol=0.000362113

(mismatch 99.95875%)
 x: array([ 0.44853 ,  0.945891,  0.187744, ...,  0.413918,  0.296722,
        0.124362], dtype=float32)
 y: array([-0.249867,  0.032941,  1.007818, ...,  0.035674, -0.935067,
        1.282567], dtype=float32)

======================================================================
FAIL: testVanillaOutput2DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 113, in testVanillaOutput2DInput
    self.doOutputTest((10, 300))
  File "layer_norm_fused_test.py", line 110, in doOutputTest
    outputs.ravel(), golds.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.0005, atol=0.0005

(mismatch 99.9333333333%)
 x: array([ 0.646108,  0.228681,  0.329904, ...,  0.00984 ,  0.904295,
        0.977675], dtype=float32)
 y: array([-1.112463,  0.940821,  0.298479, ...,  0.149518,  1.153029,
       -0.802579], dtype=float32)

======================================================================
FAIL: testVanillaOutput4DInput (__main__.LayerNormCustomTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "layer_norm_fused_test.py", line 116, in testVanillaOutput4DInput
    self.doOutputTest((100, 10, 10, 4))
  File "layer_norm_fused_test.py", line 110, in doOutputTest
    outputs.ravel(), golds.ravel(), rtol=tol, atol=tol)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/test_util.py", line 485, in assertAllClose
    np.testing.assert_allclose(a, b, rtol=rtol, atol=atol)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 1411, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/utils.py", line 796, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.0005, atol=0.0005

(mismatch 99.955%)
 x: array([ 0.203107,  0.734946,  0.61085 , ...,  0.683154,  0.015152,
        0.541072], dtype=float32)
 y: array([-0.845671,  1.303705,  0.631302, ..., -0.355135,  1.698526, -0.87971 ], dtype=float32)

----------------------------------------------------------------------
Ran 27 tests in 5.994s

FAILED (failures=20)

Mar 06 '17 15:03 NickShahML

@NickShahML Haha, yes, more info is indeed merrier, I will take a look at this now.

Mar 09 '17 00:03 MycChiu

@NickShahML I tried to compile for sm_50 and ran into the similar problem, and as it turned out, Tensorflow was not launching the CUDA kernel at all. I suspected that the -arch argument I send to nvcc was insufficient in the makefile, and the kernel was not even compiled, so I modified it according to this doc, it should work now. However, as I included compute capability >6.0 support in the current makefile, I am not sure if it will build on Maxwell cards, so if you run into problem when compiling, try using the makefile_maxwell to build instead.

Mar 09 '17 05:03 MycChiu

@MycChiu Thanks again for resolving this conflict. As an update, all 27 tests now pass!!

Unfortunately, however, I'm running the updated .so file on a maxwell card for layer norm and it seems to take up a vast amount of memory. In addition, the computation time is much slower (about 2.5x slower). My results are still early so its hard for me to determine if the layer norm is actually working on achieving a lower loss. I'll update you in the next 12 hours or so about what happens.

Mar 09 '17 18:03 NickShahML

@NickShahML Hmm... This is definitely weird, could you run the updated layer_norm_bench_mark.py in the latest commit and paste the generated benchmark_ratio.png up here, so I can see the performance compared with built-in layer norm? (it requires seaborn and pandas though)

Mar 10 '17 02:03 MycChiu

Hey @MycChiu , I ran the benchmark and it looks similar to what you have on github. It may be the fact that I'm using a seq2seq model to ultimately test this layer norm. Attached is the image of my run on a 980ti.

benchmark_ratio

I'll try running it on a completely different RNN network and see what I get. Thanks!

Mar 12 '17 16:03 NickShahML

@NickShahML Thank you for the benchmark results. It is quite interesting that the kernel's performance only suffers in a seq2seq model, do you have a snippet benchmark I could run to see if it also happens on my system?

Mar 14 '17 06:03 MycChiu

Hey @MycChiu , you can run tensorflow's seq2seq model here:

https://www.tensorflow.org/tutorials/seq2seq#tensorflow_seq2seq_library

I think this is probably the easiest way for you to test it. There is a english to french translation model they provide.

Mar 15 '17 14:03 NickShahML

Hi, was facing the same error as @cloofa - opened a PR that fixes it.

Mar 26 '17 14:03 ponythewhite