Knet.jl
Knet.jl copied to clipboard
multi-gpu support
This is a general issue to develop support for various multi-gpu techniques in Knet.
Are there plans to implement multi-GPU support? I'm considering re-implementing my PyTorch models using Knet, but multi-GPU support is an important factor since I'm working on a cluster.
Thanks!
Last person to work on this was @cangumeli who wrote Knet/src/uva.jl. Below is what he says about the latest state of affairs (also this was before the new multithreading interface). I'd love to work on this with someone who knows the requirements better than me.
We currently have GPU-direct support in Knet. You can access different GPU's memory without copying and you can copy from one GPU to another directly. All you need to do is calling Knet.enableP2P() to use these features. See https://github.com/denizyuret/Knet.jl/blob/master/src/uva.jl for the implementation.
On the other hand, we don't know how to parallelize model execution properly. There are following options to try:
- Julia has multi-process parallelism features. This will require synchronization in CPU as GPU-direct doesn't play well with multi-processing.
- You can use CUDA streams to execute multiple CUDA kernels in parallel, in one or more GPUs. With this, you can utilize GPU-direct. However, support for CUDA streams must me added to Knet kernels for that, and cudaMalloc and cudaFree cannot be parallelized with CUDA streams. See https://www.juliabloggers.com/multiple-gpu-parallelism-on-the-hpc-with-julia/ for an old example by CUDART people.
- You can try Julia threads. With threads, you can parallelize everything, but Knet and many IO features of Julia are not thread-safe. A proper lock that only parallelizes GPU kernels and not Julia execution may help, PyTorch people are doing this with Python GIL as far as I know.
- Finally, there are CUDA-aware MPI libraries like MVAPICH that you may try to use with MPI.jl.
Thank you for the update. It sounds like it wouldn't be easily implementable at the moment (especially for a Julia beginner like myself), but could be something that's supported at some point.
I'll keep an eye on this from time to time. I'm interested enough in Julia that I think I'll implement my models anyway, and if at some point multi-GPU becomes easier to incorporate, I'll add it to them.
Thanks for your prompt reply and for all your work with Knet