dlib
dlib copied to clipboard
CUDNN Runtime detection
Would it be possible to have CUDNN as a runtime detection so that if the user does not have an NVidia video card, it falls back to CPU-based BLAS libraries (like Intel MKL)?
It's possible to write code like that, but that isn't how the dlib code is setup now. If someone wants to make a PR that does this I'm fine with it so long as they figure out how to make it not confusing to users, which is the central challenge. A very large number of dlib users are just beginning to program and have no understanding of what linking is, which can be attested to by hundreds of questions posted about dlib. So somehow doing this in a way that doesn't increase the number of confused questions is the central challenge.
So how about just always fall back to the internal CBLAS if CUDNN doesn't exist. As a stepping stone, we know this adds no extra linkage (it is already compiled in) and it is a sane fallback because otherwise the application just crashes.
Yes, that's the general idea. But there are a lot of little details. How does the application "find out if cudnn exists?" in a way that doesn't lead to user confusion. How does a user tell the application where to find cuda and deal with the runtime linking options?
Rather you would find if a CUDA core exists. NVidia has a function for this in the library.
You would try cuInit and then cuDeviceGetCount != 0 and then cuDeviceComputeCapability
Yes, you could do that. It wouldn't be very difficult to do since all the calls are hidden inside the code in dlib::tt which switches to either the CPU or GPU from there. Making that runtime switchable is probably a good idea and something someone should submit in a pull request.
However, my main point is about dealing with systems that don't have cuda and/or cudnn installed. That's the main pain point people complain about. They compile the application with cuda code in it. Then they try to run it on another computer that doesn't have cuda installed and they get a runtime linking error. My impression, based on user feedback, is that that is far and away the central problem.
Anyway, if you want to submit a PR that adds a runtime switch to dlib::tt that shunts the codepath back to the CPU based on the state of the switch that would be cool :)
Not having cuda/cudnn is a separate issue (not related to this). On my side I just distribute the cuda DLL with my app (Note: for CuDNN we needed permission from NVIDIA) and they don't actually need it installed.
Right, I know. I'm just thinking about the sorts of questions I'm likely to get. I've become kind of crotchety as a result of so many ignorant questions :(
Anyway, what you are proposing sounds good. You should submit a PR :)
It would be nice if there was a base implementation class that you could implement CPUImpl or GPUImpl on top of. Right now everything is just in #ifdef's that is a bit nasty to use, in dlib::tt.
You don't need a class. You could just make a global function that returns a bool like, bool use_cuda()
and put it in the dlib::tt namespace. Then make all the routines in tt switch based on that. Anyway, you should submit a PR :)