Eric Jang
Eric Jang
Very strange. I'll take a look at it tomorrow evening. Thanks for catching this ! On Thursday, May 29, 2014, serpheroth [email protected] wrote: > Both versions have this problem. Here...
thanks a lot for looking into this - ah, that does make sense. according to the paper, 10−15 iterations are typically sufficient to diagonalize S to within single-precision roundoff error....
gr8, thanks very much On Thu, Aug 29, 2013 at 10:58 AM, Daniel Mendel [email protected]: > That said, we should test this with Morsel as Zach outlined above - >...
Thanks for bringing this to my attention. It's probably not your fault, I suspect it to be a bug in how TDB deals with Operations on tf.Variables. That would explain...
I am also running into this error (using rocm/hipcaffe docker image)
I was able to get access to a P100 machine on GCP, here are the same benchmark numbers for the same code: ``` epoch train time train loss train acc...
Ran the [torch bottleneck profiler](https://pytorch.org/docs/stable/bottleneck.html) and here are the results: Here are the slowest ops on ROCm: ``` -------------------------------------------------------------------------------- cProfile output -------------------------------------------------------------------------------- 17427493 function calls (17155957 primitive calls) in 1003.873...
Friendly ping, any updates on this? Any information I can provide on my end that would be helpful?
Attached is my prof.out file. It's about 97Mb, which I've uploaded to Google Drive here: https://drive.google.com/file/d/18yP9tBZj4bN1Da5dEsaio1J4Og4O1eGs/view?usp=sharing Here's the rpt summary output: ``` /opt/rocm/hcc/bin/rpt ~/cifar10-fast/prof.out ROI_START: GPU0 0.000000: +0.00 kernel #0.0.1...
Any tips on what I should do to speed things up? I'm training a fairly standard convnet setup, so I expect this will be a significant issue for PyTorch users...