Eric Jang comments

Results 24 comments of


                                            Eric Jang

The middle matrix is not diagonal

Very strange. I'll take a look at it tomorrow evening. Thanks for catching this ! On Thursday, May 29, 2014, serpheroth [email protected] wrote: > Both versions have this problem. Here...

The middle matrix is not diagonal

thanks a lot for looking into this - ah, that does make sense. according to the paper, 10−15 iterations are typically sufficient to diagonalize S to within single-precision roundoff error....

support for static content / html middleware?

gr8, thanks very much On Thu, Aug 29, 2013 at 10:58 AM, Daniel Mendel [email protected]: > That said, we should test this with Morsel as Zach outlined above - >...

Different results when running tdb.debug() vs. session.run()

Thanks for bringing this to my attention. It's probably not your fault, I suspect it to be a bug in how TDB deals with Operations on tf.Variables. That would explain...

MIOpen Error

I am also running into this error (using rocm/hipcaffe docker image)

gfx900 on ROCm 3.3 unexpectedly slower than P100 + CUDNN on DAWN benchmark

I was able to get access to a P100 machine on GCP, here are the same benchmark numbers for the same code: ``` epoch train time train loss train acc...

gfx900 on ROCm 3.3 unexpectedly slower than P100 + CUDNN on DAWN benchmark

Ran the [torch bottleneck profiler](https://pytorch.org/docs/stable/bottleneck.html) and here are the results: Here are the slowest ops on ROCm: ``` -------------------------------------------------------------------------------- cProfile output -------------------------------------------------------------------------------- 17427493 function calls (17155957 primitive calls) in 1003.873...

gfx900 on ROCm 3.3 unexpectedly slower than P100 + CUDNN on DAWN benchmark

Friendly ping, any updates on this? Any information I can provide on my end that would be helpful?

gfx900 on ROCm 3.3 unexpectedly slower than P100 + CUDNN on DAWN benchmark

Attached is my prof.out file. It's about 97Mb, which I've uploaded to Google Drive here: https://drive.google.com/file/d/18yP9tBZj4bN1Da5dEsaio1J4Og4O1eGs/view?usp=sharing Here's the rpt summary output: ``` /opt/rocm/hcc/bin/rpt ~/cifar10-fast/prof.out ROI_START: GPU0 0.000000: +0.00 kernel #0.0.1...

gfx900 on ROCm 3.3 unexpectedly slower than P100 + CUDNN on DAWN benchmark

Any tips on what I should do to speed things up? I'm training a fairly standard convnet setup, so I expect this will be a significant issue for PyTorch users...