Gregory Johnson

Results 9 comments of Gregory Johnson

I'm having this same problem, using a Docker image ([here](https://github.com/gregjohnso/dl-docker/blob/master/Dockerfile.gpu)) with various large networks distributed in series across 2 or 3 Pascal Titan Xs. My observations: **Without cudnn:** Works fine,...

Attached is a screenshot of "watch nvidia-smi" at the time of a crash. The temps are all within normal range. ![screen shot 2017-02-15 at 12 03 16 pm](https://cloud.githubusercontent.com/assets/17319655/22993148/ccbb48be-f376-11e6-81e6-847f1aac115d.png)

+1 this would pretty valuable for when I want to map an iterable of 300k+

We have a similar problem with training locking up on a CentOS system with 4 Pascal Titan Xs in an Ubuntu docker container. We can exec into the docker container,...

Hi Siewert, it turns out many of our users are reporting this issue with 11GB cards. Currently the only work-around is to reduce the batch size. I've updated the README...

@siewerthug were you able to find a work around for this?

@mattrasmus this would be very helpful for us. What is the approval mechanism here?

hi @singram12 can you post the exact error you're having?

Are there any updates here? This would be very useful. We're running into a similar situation where we update sub-tasks and want the scheduler to descend the DAG, get to...