cudnn.torch icon indicating copy to clipboard operation
cudnn.torch copied to clipboard

Slow loading time

Open Atcold opened this issue 8 years ago • 8 comments

Any idea why require 'cudnn' may take 45 seconds on my machine?

th> require 'cunn';
                                                                      [0.9818s]
th> require 'cudnn';
                                                                      [44.7415s]

Edit: Oh, maybe this is related. Edit2: System info is the following.

Distributor ID: CentOS
Description:    CentOS Linux release 7.4.1708 (Core) 
Release:        7.4.1708
Codename:       Core

Atcold avatar Oct 28 '17 04:10 Atcold

Hmm, other server here take 10 to 15 seconds... And the one above 40 to 45 seconds... How can I debug this?

Atcold avatar Oct 29 '17 23:10 Atcold

'require cudnn' initialize some stuff on every visible GPU. If you're on a machine with many GPUs, it may be the cause of the long loading time.

We've got a machine with 4 GPUs. Setting CUDA_VISIBLE_DEVICES=0 (for instance) reduce the loading time by almost a factor 4. On our machine, it takes <10sec though ...

clement-masson avatar Oct 30 '17 10:10 clement-masson

@clement-masson, right. I just saw that. Still, I believe some things must be wrong. I've contacted the IT (I don't have sudo here...).

Atcold avatar Oct 30 '17 22:10 Atcold

I'm finding that require cudnn on a volta takes 10 minutes. @clement-masson , any idea how I can profile the require function to see what exactly is taking so long with the volta architecture?

ajhool avatar Jan 26 '19 08:01 ajhool

@nagadomi , I'm using your distro with cuda9/10 support. Any ideas why the bindings might be struggling with the Volta architecture?

ajhool avatar Jan 31 '19 04:01 ajhool

@ajhool If you are using Docker, it may be caused by JIT Caching. See https://github.com/nagadomi/waifu2x/pull/138 , https://github.com/nagadomi/waifu2x/pull/138/files#diff-04c6e90faac2675aa89e2176d2eec7d8

nagadomi avatar Jan 31 '19 06:01 nagadomi

I am using docker and I'll give that a shot, thanks!

ajhool avatar Jan 31 '19 06:01 ajhool

So far, the JIT Caching fix does not appear to be working, although I'm having a hard time debugging Torch/Lua without a debug environment or print statements. I believe I have the cache and cache path configured correctly and the load time is still about 10 minutes.

The fact that the code executes quickly on K80's but takes so much longer on Voltas makes me suspect there's more to it than just luajit. Will continue to try and get to the bottom of this.

ajhool avatar Feb 01 '19 05:02 ajhool