MycChiu

Results 17 comments of MycChiu

Hmmm...Yes, I have been using them on all my models, and they worked properly. After reading through the codes again, I think the culprit might be the `atomicAdd` function, since...

I fixed it, but after looking at it again, the bug should only affect inputs with float64 dtype, but I forced float32 in the tests, so this bug may not...

Yeah, failing on the tests are definitely unusual. It would be nice if you can let me know which 7 of the tests actually passed, and some of the feedback...

@NickShahML Haha, yes, more info is indeed merrier, I will take a look at this now.

@NickShahML I tried to compile for `sm_50` and ran into the similar problem, and as it turned out, Tensorflow was not launching the CUDA kernel at all. I suspected that...

@NickShahML Hmm... This is definitely weird, could you run the updated `layer_norm_bench_mark.py` in the latest commit and paste the generated `benchmark_ratio.png` up here, so I can see the performance compared...

@NickShahML Thank you for the benchmark results. It is quite interesting that the kernel's performance only suffers in a seq2seq model, do you have a snippet benchmark I could run...