DBT-Reconstruction
DBT-Reconstruction copied to clipboard
CUDA based projection and backprojection calls in SART
Hello,
I compiled the .sln
files under Functions/Sources
in Visual studio IDE
and built files backprojectionDDb_mex_CUDA.mexw64
and projectionDDb_mex_CUDA.mexw64
. SART algorithm requires projection and backprojection to run for each iteration and under each iteration for each projection. The CUDA version of these methods seem to accept only two parameters and so the projection number cannot be passed.
The modified projection and backprojection function calls in SART.m
proj_norm = projection(ones(parameter.ny, parameter.nx, parameter.nz, 'single'),parameter, []);
to
proj_norm = projectionDDb_mex_CUDA(ones(parameter.ny, parameter.nx, parameter.nz, 'double'),parameter);
vol_norm = backprojection(ones(parameter.nv, parameter.nu, parameter.nProj, 'single'), parameter, []);
to
vol_norm = backprojectionDDb_mex_CUDA(ones(parameter.nv, parameter.nu, parameter.nProj, 'double'), parameter);
proj_diff = proj(:,:,p) - projection(reconData3d,parameter,p);
to
proj_diff = proj(:,:,p) - projectionDDb_mex_CUDA(reconData3d,parameter,p);
upt_term = backprojection(proj_diff,parameter,p);
to
upt_term = backprojectionDDb_mex_CUDA(proj_diff,parameter,p);
The SART execution shows error
Error using projectionDDb_mex_CUDA
projection_mex requires two input arguments.
Error in SART (line 87)
proj_diff = proj(:,:,p) - projectionDDb_mex_CUDA(reconData3d,parameter,p);
How can I run cuda versions of methods for SART iterations?
Thanks.
Hi,
As you said, SART needs to perform updates on each projection. This requires that the function accepts the projection number as input. This was done on CPU versions, but not yet in the GPU.
Actually, this is straight forward. If you are familiar with c++, you only need to modify this for loop:
for (unsigned int p = 0; p < nProj; p++)
and add some input to the projection number.
I will modify it, but if you want to get things ready before me. Or you can use SIRT until I modify it.
Let me know what you think.
Best.
Ok. Thanks. I will modify for projection number.
Hello @roshtha .
I have done the modifications. Please, test it and let me know if it works for you.
I have performed some simple tests and worked.
The API works this way:
% Make the CUDA Backprojection
reconData3d = backprojectionDDb_mex_cuda(double(proj),parameter,-1);
% Make the CUDA Projection
projs = projectionDDb_mex_cuda(double(reconData3d), parameter, -1);
if you set the nProj
, the last parameter, to -1
it will run over all projections. Otherwise, it will compute the projections specified, e.g. 5
.
It will throw an error if you set nProj
to be equal o greater than the number o projections you specified in the parameters configuration file.
Let me know if it is clear to you.
Best.
Thank you so much. I will modify and let you know.
Thanks.
I could build and call GPU versions of projection and backprojection for SART. Thanks for the code updates. Now I am getting error as
GPU Device 0: "Quadro K620" with compute capability 5.0 has 3 Multi-Processors and zu bytes of global memory
cudaMalloc Initial
Error using projectionDDb_mex_CUDA
out of memory
Error in SART (line 68)
proj_norm = projectionDDb_mex_CUDA(ones(parameter.ny, parameter.nx, parameter.nz, 'double'),parameter,-1);
Error in Recon (line 71)
dataRecon3d = SART(double(dataProj),nIter,parameter);
gpuDevice shows
CUDADevice with properties:
Name: 'Quadro K620'
Index: 1
ComputeCapability: '5.0'
SupportsDouble: 1
DriverVersion: 10.2000
ToolkitVersion: 7
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
AvailableMemory: 1.7065e+09
MultiprocessorCount: 3
ClockRateKHz: 1124000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
What is the minimum memory required for running cuda versions?
It failed when allocating these variables:
cudaMalloc((void **)&d_pProj, nDetX*nDetY*nProj * sizeof(double));
cudaMalloc((void **)&d_projI, nDetXMap*nDetYMap * sizeof(double));
cudaMalloc((void **)&d_pVolume, nPixXMap*nPixYMap*nSlices * sizeof(double));
cudaMalloc((void **)&d_pTubeAngle, nProj * sizeof(double));
cudaMalloc((void **)&d_pDetAngle, nProj * sizeof(double));
The total memory needed depends on the size of your projections and the volume to be reconstructed. These two take a larger amount of memory. You have only 2GB, and maybe your OS is also using this memory for video processing.
You can try to reconstruct fewer slices to see if works. Also, there is the OpenMP version which uses CPU RAM memory. You can try it al well.
Hope it helps.