pycbc How do I debug not-using-GPU?

I have carefully followed the Instructions to add CUDA support. I was hoping qtransforms would now use the GPU.

On startup, I see this, which looks promising:

/home/ec2-user/.conda/envs/kaggle-gw/lib/python3.9/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')

I am also running the following code on startup:

    fft_backends = backend_support.get_backend_names()
    print(f"Available fft backends: {fft_backends}")
    if "cuda" in fft_backends:
        print("Using cuda.")
        backend_support.set_backend(["cuda"])

I see the following output:

Available fft backends: ['mkl', 'numpy', 'cuda']
Using cuda.

But when I run a bunch of qtransforms, no load goes to my GPU.

What do I do next?

Sep 06 '21 23:09 deansher

I am working with Dean on this project and it's a Kaggle competition that we need this working for. We are in a bit of a time crunch and need this working to make transformations faster. @ahnitz @spxiwh any pointers?

Sep 10 '21 20:09 supritashankar

Hi both, apologies for slow response on this.

I should probably preface all this by saying that GPU support in PyCBC is not at the level of GPU support in something like tensorflow. We do not have many GPU experts in PyCBC (in fact we have exactly 1), and I know nothing of CUDA programming myself. The GPU effort that we have has gone into making the matched-filtering component of our production "searching for compact binary merger" workflow work efficiently on GPUs. That application is well optimized for GPUs, but that took a lot of effort to achieve. In other places outside of this critical path the code may run on GPUs, but there is no guarantee of optimality (in fact suboptimality might be expected), in other places the code can only run on CPUs.

Looking over our Q-transform code (https://github.com/gwastro/pycbc/blob/master/pycbc/filter/qtransform.py), I think that this code should run on a GPU as it's a bunch of array operations, and FFTs. I'm assuming you're using the qtransform method of the TimeSeries object. However, instead of using the set_fft_backend you want something a bit more "all in" to run properly in GPUs. This would look like:

from pycbc.scheme import CUDAScheme

...

with CUDAScheme(device_num=0) as ctx:
    # No TimeSeries to be created until here
    # All production code lies within this with block

this will make all TimeSeries that are created within the with CUDAScheme() as ctx: block use pycuda.gpuarray.GPUArray objects and not numpy.ndarray objects. Just make sure that the data TimeSeries is created within this block so it is correctly a GPUArray. In the CUDAScheme the only valid FFT backend should be 'cuda' and then this should be chosen by default.

If all this still appears to be working only on the CPU, it might be worth creating a profile image to make sure the code is calling into GPU things and not just doing FFTs using fftw/mkl/numpy. Beyond that though, I'm not sure I can be of too much help debugging GPU things, sorry.

Sep 12 '21 20:09 spxiwh

pycbc pycbc copied to clipboard

How do I debug not-using-GPU?

pycbc
pycbc copied to clipboard