DBT-Reconstruction icon indicating copy to clipboard operation
DBT-Reconstruction copied to clipboard

CUDA based projection and backprojection calls in SART

Open roshtha opened this issue 4 years ago • 6 comments

Hello,

I compiled the .sln files under Functions/Sources in Visual studio IDE and built files backprojectionDDb_mex_CUDA.mexw64 and projectionDDb_mex_CUDA.mexw64. SART algorithm requires projection and backprojection to run for each iteration and under each iteration for each projection. The CUDA version of these methods seem to accept only two parameters and so the projection number cannot be passed.

The modified projection and backprojection function calls in SART.m

proj_norm = projection(ones(parameter.ny, parameter.nx, parameter.nz, 'single'),parameter, []); to proj_norm = projectionDDb_mex_CUDA(ones(parameter.ny, parameter.nx, parameter.nz, 'double'),parameter);

vol_norm = backprojection(ones(parameter.nv, parameter.nu, parameter.nProj, 'single'), parameter, []); to vol_norm = backprojectionDDb_mex_CUDA(ones(parameter.nv, parameter.nu, parameter.nProj, 'double'), parameter);

proj_diff = proj(:,:,p) - projection(reconData3d,parameter,p); to proj_diff = proj(:,:,p) - projectionDDb_mex_CUDA(reconData3d,parameter,p);

upt_term = backprojection(proj_diff,parameter,p); to upt_term = backprojectionDDb_mex_CUDA(proj_diff,parameter,p);

The SART execution shows error

Error using projectionDDb_mex_CUDA
projection_mex requires two input arguments.

Error in SART (line 87)
        proj_diff = proj(:,:,p) - projectionDDb_mex_CUDA(reconData3d,parameter,p);

How can I run cuda versions of methods for SART iterations?

Thanks.

roshtha avatar May 05 '20 11:05 roshtha

Hi,

As you said, SART needs to perform updates on each projection. This requires that the function accepts the projection number as input. This was done on CPU versions, but not yet in the GPU.

Actually, this is straight forward. If you are familiar with c++, you only need to modify this for loop:

for (unsigned int p = 0; p < nProj; p++)

and add some input to the projection number.

I will modify it, but if you want to get things ready before me. Or you can use SIRT until I modify it.

Let me know what you think.

Best.

rodrigovimieiro avatar May 05 '20 12:05 rodrigovimieiro

Ok. Thanks. I will modify for projection number.

roshtha avatar May 05 '20 13:05 roshtha

Hello @roshtha .

I have done the modifications. Please, test it and let me know if it works for you.

I have performed some simple tests and worked.

The API works this way:

% Make the CUDA Backprojection
reconData3d = backprojectionDDb_mex_cuda(double(proj),parameter,-1);

% Make the CUDA Projection
projs = projectionDDb_mex_cuda(double(reconData3d), parameter, -1);

if you set the nProj, the last parameter, to -1 it will run over all projections. Otherwise, it will compute the projections specified, e.g. 5.

It will throw an error if you set nProj to be equal o greater than the number o projections you specified in the parameters configuration file.

Let me know if it is clear to you.

Best.

rodrigovimieiro avatar May 09 '20 14:05 rodrigovimieiro

Thank you so much. I will modify and let you know.

Thanks.

roshtha avatar May 09 '20 15:05 roshtha

I could build and call GPU versions of projection and backprojection for SART. Thanks for the code updates. Now I am getting error as

GPU Device 0: "Quadro K620" with compute capability 5.0 has 3 Multi-Processors and zu bytes of global memory

cudaMalloc Initial 
Error using projectionDDb_mex_CUDA
out of memory

Error in SART (line 68)
proj_norm = projectionDDb_mex_CUDA(ones(parameter.ny, parameter.nx, parameter.nz, 'double'),parameter,-1);

Error in Recon (line 71)
dataRecon3d = SART(double(dataProj),nIter,parameter);

gpuDevice shows

 CUDADevice with properties:


                      Name: 'Quadro K620'
                     Index: 1
         ComputeCapability: '5.0'
            SupportsDouble: 1
             DriverVersion: 10.2000
            ToolkitVersion: 7
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 2.1475e+09
           AvailableMemory: 1.7065e+09
       MultiprocessorCount: 3
              ClockRateKHz: 1124000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

What is the minimum memory required for running cuda versions?

roshtha avatar May 15 '20 13:05 roshtha

It failed when allocating these variables:

cudaMalloc((void **)&d_pProj, nDetX*nDetY*nProj * sizeof(double));
cudaMalloc((void **)&d_projI, nDetXMap*nDetYMap * sizeof(double));
cudaMalloc((void **)&d_pVolume, nPixXMap*nPixYMap*nSlices * sizeof(double));
cudaMalloc((void **)&d_pTubeAngle, nProj * sizeof(double));
cudaMalloc((void **)&d_pDetAngle, nProj * sizeof(double));

The total memory needed depends on the size of your projections and the volume to be reconstructed. These two take a larger amount of memory. You have only 2GB, and maybe your OS is also using this memory for video processing.

You can try to reconstruct fewer slices to see if works. Also, there is the OpenMP version which uses CPU RAM memory. You can try it al well.

Hope it helps.

rodrigovimieiro avatar May 15 '20 21:05 rodrigovimieiro