elastix
elastix copied to clipboard
No speedup from OpenCL
I recently successfully compiled Elastix 5.0.0 with GPU support, but it doesn't appear to speed up anything relative to the CPU version. I can use the (Resampler "OpenCLResampler")
option but there is no improvement in speed. The GPU maxes out at 2% utilization.
Also, the (FixedImagePyramid "OpenCLFixedGenericImagePyramid")
and (MovingImagePyramid "OpenCLMovingGenericImagePyramid")
options give me CL_OUT_OF_RESOURCES errors if I try to use them. They also cause the OpenCLResampler to fail with the same error if they are used.
My setup: OS: CentOS Linux release 7.7.1908 (Core) GPU: Nvidia GeForce GTX 1080 Ti Cuda version: 10.2
Relevant compile settings from CMakeCache.txt:
ELASTIX_USE_OPENCL:BOOL=ON
USE_OpenCLFixedGenericPyramid:BOOL=ON
USE_OpenCLMovingGenericPyramid:BOOL=ON
USE_OpenCLResampler:BOOL=ON
I would expect GPU support to significantly reduce execution time. What can I do to speed up Elastix? Also, why are the pyramids failing to run on GPU?
The pyramids likely fail because you need more VRAM on the GPU. You may be able to get them to fit in memory if you change the FixedInternalImagePixelType
and ShortInternalImagePixelType
which may default to float
in the elastix parameter file. The pyramids requires a lot of bytes for large images, many registration resolutions and gets magnified by high bit depth like float.
As for the speed issues, I have no insight.
I have the same problem. I have compiled Elastix 5.0.0 with GPU support but registration does not appear to be faster compared to the CPU version.
Since I have read somewhere that some people are experiencing performance gains, I wonder if it has to do with specifics of the parameter files, i.e. specific combinations of ImageSampler
and ResampleInterpolater
etc.
One thing that I notice is that for both cases (1) (Resampler "DefaultResampler")
and (2) (Resampler "OpenCLResampler") (OpenCLResamplerUseOpenCL "true")
only 137 MB are allocated on the GPU. This seems odd to me because for (1) I would expect no memory to be allocated at all and for (2) it certainly is not enough and thus indicating that the computation is not actually happening on the GPU (which might explain the lack of performance-gain.)
Setup info: GPU: Nvidia GeForce GTX 1080 Ti OS: Linux 4.4.0-116-generic (x64) with 128828 MB memory, and 16 cores @ 1200 MHz. Cuda version: 10.1
Image sizes of fixed and moving are 640 640 225
The following is the config I am using for an affine transformation, but I also don't observe any gains for more expensive calculations. (Also ignoring the OpenCL pyramid implementations for now, since they cause the CL_OUT_OF_RESOURCES errors):
// Input images.
(FixedInternalImagePixelType "short")
(FixedImageDimension 3)
(MovingInternalImagePixelType "short")
(MovingImageDimension 3)
(OpenCLDeviceID "3")
(OpenCLDeviceType "GPU")
// Image sampler.
//(ImageSampler "RandomSparseMask")
(ImageSampler "Random")
(ErodeMask "false")
(WriteResultImage "false")
(AutomaticTransformInitialization "true")
(DefaultPixelValue 0)
(HowToCombineTransforms "Compose")
// Components.
(Registration "MultiResolutionRegistration")
(FixedImagePyramid "FixedRecursiveImagePyramid")
//(FixedImagePyramid "OpenCLFixedGenericImagePyramid")
//(OpenCLFixedGenericImagePyramidUseOpenCL "true")
(MovingImagePyramid "MovingRecursiveImagePyramid")
//(MovingImagePyramid "OpenCLMovingGenericImagePyramid")
//(OpenCLMovingGenericImagePyramidUseOpenCL "true")
(Interpolator "BSplineInterpolator")
(Metric "AdvancedMattesMutualInformation")
(Optimizer "StandardGradientDescent")
(ResampleInterpolator "FinalBSplineInterpolator")
//(Resampler "DefaultResampler")
(Resampler "OpenCLResampler")
(OpenCLResamplerUseOpenCL "true")
// Metric parameters.
(NumberOfSpatialSamples 4096)
(NumberOfHistogramBins 32)
// Kind of transform.
(Transform "AffineTransform")
(NumberOfResolutions 4)
(MaximumNumberOfIterations 256)
(MaximumNumberOfSamplingAttempts 10)
(NewSamplesEveryIteration "true")
(UseAllPixels "false")
(BSplineInterpolationOrder 1)
(FinalBSplineInterpolationOrder 1)
(SP_a 500.0)
(SP_A 50.0)
(SP_alpha 0.602)
(AutomaticScalesEstimation "true")
(ImagePyramidSchedule 8 8 8 4 4 4 2 2 2 1 1 1) // XYZ per resolution.
Any pointers would be much appreciated!
Hi, Do you know how to release gpu memory during debugging?
Hi @hbraunDSP, @MiHess, @dyliu2016, thanks for the discussion here! The CL_OUT_OF_RESOURCES error (issue #70) was fixed by PR #734, and a bug related to OpenCL Resampler was fixed by PR #741. @dpshamonin run some extensive benchmarks for the Resampler, and we noticed quite some speed improvement i.e. it was orders of magnitude faster, especially for larger images! Is it possible for you to re-run your code using the latest main
branch and let us know if you still observe the same behavior?
Hi @ntatsisk and @dpshamonin: thanks for all the bug fixes/updates. I just built the latest main and no longer get the CL_OUT_OF_RESOURCES
error. However, I see no speed up when using the GPU compared to the CPU version.
It seems like it's correctly using the GPU since I get the following lines in the log: e.g.
Fixed pyramid was computed by NVIDIA RTX A4000 from NVIDIA Corporation. Moving pyramid was computed by NVIDIA RTX A4000 from NVIDIA Corporation.Preparation of the image pyramids took: 6 ms.
I see that the GPU memory allocation goes up to about 200 MB, but the GPU utilization stays 0% throughout the registration. The total time it takes is about the same as the CPU, and there's no error inn the log file.
These are the modifications I made to the parameter file to use the GPU:
(OpenCLDeviceID "1")
(OpenCLDeviceType "GPU")
(Resampler "OpenCLResampler")
(OpenCLResamplerUseOpenCL "true")
(FixedImagePyramid "OpenCLFixedGenericImagePyramid")
(OpenCLFixedGenericImagePyramidUseOpenCL "true")
(MovingImagePyramid "OpenCLMovingGenericImagePyramid")
(OpenCLMovingGenericImagePyramidUseOpenCL "true")
Did I miss anything?
Also @ntatsisk or @dpshamonin: you mentioned getting multiple orders of magnitude speed improvement. Would you be able to share a test case (e.g. fixed img, moving img, parameter file) so that we could test and benchmark the GPU/OpenCL functionalities of elastix?
UPDATE:
Turns out I still get the CL_OUT_OF_RESOURCES
error if I set the pyramid schedule differently. The max allocation only goes up to about 3GB, where we have a 16GB GPU. I also tried it on a machine with a much larger memory (48GB GPU memory) and did not work.
The build is straight from the GitHub repo main branch, so I'm not sure what's going on. Getting some example/test case (images and parameters) would be tremendously helpful!
error: in function: opencl_context_notify
Details: OpenCL error during context creation or runtime:
CL_OUT_OF_RESOURCES error executing CL_COMMAND_WRITE_BUFFER on NVIDIA RTX A4000 (Device 0).
First of all thanks for all the work on this @ntatsisk and @dpshamonin.
I have built the latest version and did some experimenting, but unfortunately I am still getting the CL_OUT_OF_RESOURCES
errors. Just like @urlicht described, also for me the GPU memory allocation goes up to about 3GB (of 12GB total).
I also tried with images and a modified parameter file from the ITKElastix
examples to facilitate reproducing/tracking down the error. Source: https://github.com/InsightSoftwareConsortium/ITKElastix/tree/main/examples/data
From the elastix.log
file:
which elastix: elastix
elastix version: 5.0.1
Git revision SHA: b581b9242157cabc3e029bd9eeeef987479ed195
Git revision date: Thu Dec 22 21:27:19 2022 +0100
Build date: Dec 29 2022 18:49:16
Compiler: GCC version 11.3.0
Memory address size: 64-bit
CMake version: 3.25.0-rc1
ITK version: 5.3.0
ELASTIX version: 5.0.1
Command line options from ElastixBase:
-f data/CT_3D_lung_fixed.mha
-m data/CT_3D_lung_moving.mha
-fMask data/CT_3D_lung_fixed_mask.mha
-mMask data/CT_3D_lung_moving_mask.mha
-out itkelastix_example/
-p data/registration/parameters.3D.NC.affine.ASGD.001.txt
-threads unspecified, so all available threads are used
Command line options from TransformBase:
-t0 unspecified, so no initial transform used
The only changes made to parameters.3D.NC.affine.ASGD.001.txt
are the following:
(OpenCLDeviceID "0")
(OpenCLDeviceType "GPU")
(FixedImagePyramid "OpenCLFixedGenericImagePyramid")
(OpenCLFixedGenericImagePyramidUseOpenCL "true")
(MovingImagePyramid "OpenCLMovingGenericImagePyramid")
(OpenCLMovingGenericImagePyramidUseOpenCL "true")
(Resampler "OpenCLResampler")
(OpenCLResamplerUseOpenCL "true")
//(FixedImagePyramid "FixedRecursiveImagePyramid")
//(MovingImagePyramid "MovingRecursiveImagePyramid")
//(Resampler "DefaultResampler")
And this results in the following errors:
ERROR: Exception during updating GPU fixed pyramid calculation:
itk::ExceptionObject (0x557312702d00)
Location: "unknown"
File: /home/mirco/elastix/elastix/Common/OpenCL/ITKimprovements/itkGPUDataManager.cxx
Line: 240
Description: CL_OUT_OF_RESOURCES
WARNING: The fixed pyramid computation with OpenCL failed due to the error.
The OpenCLFixedGenericImagePyramid is switching back to CPU mode.
ERROR: Exception during creating GPU input image for moving generic pyramid:
itk::ExceptionObject (0x557312a87b40)
Location: "unknown"
File: /home/mirco/elastix/elastix/Common/OpenCL/ITKimprovements/itkGPUImageDataManager.hxx
Line: 167
Description: CL_OUT_OF_RESOURCES
WARNING: Unable to configure the GPU.
The OpenCLMovingGenericImagePyramid is switching back to CPU mode.
I noticed that there are additional errors shown in the console output:
/home/mirco/elastix/elastix/Common/OpenCL/ITKimprovements/itkOpenCLContext.cxx(165): itkOpenCL generic error.
Error: in function: opencl_context_notify
Details: OpenCL error during context creation or runtime:
CL_OUT_OF_RESOURCES error executing CL_COMMAND_WRITE_BUFFER on NVIDIA GeForce GTX 1080 Ti (Device 0).
/home/mirco/elastix/elastix/Common/OpenCL/ITKimprovements/itkOpenCLContext.cxx(165): itkOpenCL generic error.
Error: in function: opencl_context_notify
Details: OpenCL error during context creation or runtime:
Unknown error executing clFlush on NVIDIA GeForce GTX 1080 Ti (Device 0).
ERROR: Exception during updating GPU fixed pyramid calculation:
itk::ExceptionObject (0x557312702d00)
Location: "unknown"
File: /home/mirco/elastix/elastix/Common/OpenCL/ITKimprovements/itkGPUDataManager.cxx
Line: 240
Description: CL_OUT_OF_RESOURCES
WARNING: The fixed pyramid computation with OpenCL failed due to the error.
The OpenCLFixedGenericImagePyramid is switching back to CPU mode.
/home/mirco/elastix/elastix/Common/OpenCL/ITKimprovements/itkOpenCLContext.cxx(165): itkOpenCL generic error.
Error: in function: opencl_context_notify
Details: OpenCL error during context creation or runtime:
CL_OUT_OF_RESOURCES error executing CL_COMMAND_WRITE_BUFFER on NVIDIA GeForce GTX 1080 Ti (Device 0).
ERROR: Exception during creating GPU input image for moving generic pyramid:
itk::ExceptionObject (0x557312a87b40)
Location: "unknown"
File: /home/mirco/elastix/elastix/Common/OpenCL/ITKimprovements/itkGPUImageDataManager.hxx
/home/mirco/elastix/elastix/Common/OpenCL/ITKimprovements/itkOpenCLContext.cxx(165): itkOpenCL generic error.
Error: in function: opencl_context_notify
Details: OpenCL error during context creation or runtime:
Unknown error executing clFlush on NVIDIA GeForce GTX 1080 Ti (Device 0).
Line: 167
Description: CL_OUT_OF_RESOURCES
WARNING: Unable to configure the GPU.
The OpenCLMovingGenericImagePyramid is switching back to CPU mode.
Preparation of the image pyramids took: 128 ms.
@ntatsisk and @dpshamonin, do you see where it might go wrong? Are you not getting those errors anymore on the example above? Or could you please share a set of images and parameter file for further testing?
Thanks in advance!
Apologies for the late reply. I tried to reproduce the error but I couldn't. I used the setup that @MiHess shared, where the data come from https://github.com/InsightSoftwareConsortium/ITKElastix/tree/main/examples/data ("CT_3D_lung") and I attach the exact parameter file. Again same as @MiHess's but I only changed to (WriteResultImage "true")
so that the resampler is also triggered. I gave it a go both in a windows and an ubuntu machines and I attach the corresponding logs. The logs are from the executables but I also tested the library versions and again no error.
Here are the files: parameters.3D.NC.affine.ASGD.001_OpenCL.txt log_executable_windows.log log_executable_ubuntu.log
Fixed pyramid was computed by NVIDIA GeForce RTX 3090 from NVIDIA Corporation . Moving pyramid was computed by NVIDIA GeForce RTX 3090 from NVIDIA Corporation .Preparation of the image pyramids took: 334 ms.
As you can see the GPU was used normally.
The (windows) setup:
which elastix: C:\Users\kntatsis\work\opencl-error-test\elastix.exe
elastix version: 5.1.0
Git revision SHA: d652938573e5f193955908eba225a854b31ce36a
Git revision date: Thu Jan 12 14:20:18 2023 +0100
Build date: Feb 14 2023 14:34:23
Compiler: Visual C++ version 193331630.0
Memory address size: 64-bit
CMake version: 3.25.0-rc1
ITK version: 5.3.0
Note that I am using Elastix version 5.1.0 that was released recently. It shouldn't be different that the commit that you used but it is easier to reference.
@urlicht Can you share the exact pyramid setup that triggered the error in your case?
Looking forward to your replies. I will try to be more responsive so that we get to solve this issue after all this time ;)