cudnn.torch
cudnn.torch copied to clipboard
Slow loading time
Any idea why require 'cudnn' may take 45 seconds on my machine?
th> require 'cunn';
[0.9818s]
th> require 'cudnn';
[44.7415s]
Edit: Oh, maybe this is related. Edit2: System info is the following.
Distributor ID: CentOS
Description: CentOS Linux release 7.4.1708 (Core)
Release: 7.4.1708
Codename: Core
Hmm, other server here take 10 to 15 seconds... And the one above 40 to 45 seconds... How can I debug this?
'require cudnn' initialize some stuff on every visible GPU. If you're on a machine with many GPUs, it may be the cause of the long loading time.
We've got a machine with 4 GPUs. Setting CUDA_VISIBLE_DEVICES=0 (for instance) reduce the loading time by almost a factor 4. On our machine, it takes <10sec though ...
@clement-masson, right. I just saw that. Still, I believe some things must be wrong. I've contacted the IT (I don't have sudo here...).
I'm finding that require cudnn on a volta takes 10 minutes. @clement-masson , any idea how I can profile the require function to see what exactly is taking so long with the volta architecture?
@nagadomi , I'm using your distro with cuda9/10 support. Any ideas why the bindings might be struggling with the Volta architecture?
@ajhool If you are using Docker, it may be caused by JIT Caching. See https://github.com/nagadomi/waifu2x/pull/138 , https://github.com/nagadomi/waifu2x/pull/138/files#diff-04c6e90faac2675aa89e2176d2eec7d8
I am using docker and I'll give that a shot, thanks!
So far, the JIT Caching fix does not appear to be working, although I'm having a hard time debugging Torch/Lua without a debug environment or print statements. I believe I have the cache and cache path configured correctly and the load time is still about 10 minutes.
The fact that the code executes quickly on K80's but takes so much longer on Voltas makes me suspect there's more to it than just luajit. Will continue to try and get to the bottom of this.