elastix
elastix copied to clipboard
BUG: Fix OpenCL pyramids (original: https://github.com/SuperElastix/elastix/pull/243)
This PR provides the cosmetic changes needed for PR https://github.com/SuperElastix/elastix/pull/243 together with a small addition. Hence, full credits should be given to @squll1peter for writing up the original PR and resolving the CL_OUT_OF_RESOURCES issue discussed in PR https://github.com/SuperElastix/elastix/pull/243 and issue https://github.com/SuperElastix/elastix/issues/70! Two different bugs are actually solved. In short summary:
-
The CL_OUT_OF_RESOURCES error. This bug was created because
itk::SmoothingRecursiveGaussianImageFilter
was overriding (here: https://github.com/InsightSoftwareConsortium/ITK/blob/3c6de8f009aadfab8f801070237cb699f5c895cb/Modules/Filtering/Smoothing/include/itkSmoothingRecursiveGaussianImageFilter.h#L80-L87) the internal pixel type ofRecursiveGaussianImageFilter
member objects using the templated structRebind
. However,Rebind
was only implemented foritk::Image
and notitk::GPUImage
, making the latter call the implementation of the former and leading to mixing of the two types. The solution here was to create the equivalentRebind
foritk::GPUImage
-
A crash was appearing during the start of the second registration resolution because only the first output of the pyramid member object was communicated/grafted to the FixedGenericPyramid / MovingGenericPyramid objects. A simple for-loop to graft all outputs was added as solution.
In addition to the above bug-fixing changes, the class signature of itkGPUResampleImageFilter was updated to match the one of itkResampleImageFilter where an extra templated parameter was added (TTransformPrecisionType
). The itkGPUResampleImageFilterFactory
was updated accordingly.
An important note: Several of the related tests were passing successfully before this changes, where clearly they shouldn't. So, some work needs to be done there to make them effective.
A second note is that memory allocation enhancement, provided in the original PR, is not implemented here, but it should be looked into the future.
Great work!
A second note is that memory allocation enhancement, provided in the original PR, is not implemented here, but it should be looked into the future.
I think a new test will make this bug visible, when run on a modern GPU.
This PR & comment summarized the reasoning and code elegantly, I really appreciate it!
I added some changes to the pyramid tests because I noticed they were all passing before this PR, which clearly showed that the bug/crash in the code was not really covered in the tests.
The relevant test here is the GPUGenericMultiResolutionPyramid
test. The initial reason that it was passing was that in fact the CPU version of the Resampler was being called instead of the GPU, which indicates that the corresponding Factory+overloaded function (this one: https://github.com/SuperElastix/elastix/blob/fdd71315d26ef61b814f77ec939c0e10951e537b/Common/OpenCL/Factories/itkGPUResampleImageFilterFactory.h#L84-L88) was not doing its job. Looking into it further, it appears that PrecisionType
was not specified for the GenericMultiResolutionPyramid
inside the test, which in turn meant that default double
was used. Still, the corresponding Factory should have worked, but it seems that it doesn't, which is a bug. Here, I just added the PrecisionType=float
in order to call a different overloaded function of the Factory, which in fact works (the same used inside the code). So, in this way we avoid using the problematic factory but still the bug is there (maybe it is in the ITK side, since all the factor magic happens inside the itkObjectFactoryBase.h
file).
As expected, this change allowed the GPUResampler to get called as it should. However, the test was still passing successfully! Reminder: the equivalent part of the code crashes. Why? I don't understand fully all the details, but the high-level difference is that inside the code we call GenericMultiResolutionPyramidImageFilter<GPUInputImageType, GPUOutputImageType, PrecisionType>
while inside the test: GenericMultiResolutionPyramidImageFilter<InputImageType, OutputImageType, PrecisionType>
. So, I added an extra test to cover also for the case of calling the pyramid filter with itk::GPUImage
s instead of itk::Image
s. This new test fails as expected before the changes introduced here, and passes after the changes in this PR.
To explain a bit better the problem with the factory registration, I created a minimal example:
#include "itkProcessObject.h"
#include "itkObjectFactoryBase.h"
#include "itkImage.h"
namespace itk
{
// MyFilter
template <typename TInputImage, typename TParameter=double>
class MyFilter : public ProcessObject
{
public:
using Self = MyFilter;
using Superclass = ProcessObject;
using Pointer = SmartPointer<Self>;
using ConstPointer = SmartPointer<const Self>;
itkNewMacro(Self);
itkTypeMacro(MyFilter, ProcessObject);
}; // end MyFilter
// SubFilter
template <typename TInputImage, typename TParameter=double>
class SubFilter : public MyFilter<TInputImage, TParameter>
{
public:
using Self = SubFilter;
using Superclass = MyFilter;
using Pointer = SmartPointer<Self>;
using ConstPointer = SmartPointer<const Self>;
itkNewMacro(Self);
itkTypeMacro(SubFilter, MyFilter);
}; // end SubFilter
// SubFilterFactory
class SubFilterFactory : public itk::ObjectFactoryBase
{
public:
using Self = SubFilterFactory;
using Superclass = itk::ObjectFactoryBase;
using Pointer = itk::SmartPointer<Self>;
using ConstPointer = itk::SmartPointer<const Self>;
itkFactorylessNewMacro(Self);
itkTypeMacro(SubFilterFactory, itk::ObjectFactoryBase);
static void
RegisterOneFactory()
{
auto factory = SubFilterFactory::New();
itk::ObjectFactoryBase::RegisterFactory(factory);
}
const char *
GetITKSourceVersion() const override
{
return "itk version 5.3.0";
}
const char *
GetDescription() const override
{
return "A Factory";
}
private:
SubFilterFactory()
{
using InputImage = Image<float, 3>;
// double -> float
this->RegisterOverride(typeid(MyFilter<InputImage, double>).name(),
typeid(SubFilter<InputImage, float>).name(),
" ",
true,
CreateObjectFunction<SubFilter<InputImage, float>>::New());
// float -> float
this->RegisterOverride(typeid(MyFilter<InputImage, float>).name(),
typeid(SubFilter<InputImage, float>).name(),
" ",
true,
CreateObjectFunction<SubFilter<InputImage, float>>::New());
}
}; // end SubFilterFactory
} // end itk
int
main(int argc, char * argv[])
{
using InputImage = itk::Image<float, 3>;
using MyFilterFloatType = itk::MyFilter<InputImage, float>;
using MyFilterDoubleType = itk::MyFilter<InputImage, double>;
// Instantiate MyFilter class as usual
auto myFilterFloat = MyFilterFloatType::New();
auto myFilterDouble = MyFilterDoubleType::New();
std::cout << "Before factory registration:" << std::endl;
std::cout << typeid(*myFilterFloat).name() << std::endl;
std::cout << typeid(*myFilterDouble).name() << "\n" << std::endl;
// Register factory
itk::SubFilterFactory::RegisterOneFactory();
// From now on, only SubFilters should be instantiated
auto subFilterFloat = MyFilterFloatType::New();
auto subFilterDouble = MyFilterDoubleType::New();
std::cout << "After factory registration:" << std::endl;
std::cout << typeid(*subFilterFloat).name() << std::endl;
std::cout << typeid(*subFilterDouble).name() << std::endl;
} // end main()
The output is:
Before factory registration:
class itk::MyFilter<class itk::Image<float,3>,float>
class itk::MyFilter<class itk::Image<float,3>,double>
After factory registration:
class itk::SubFilter<class itk::Image<float,3>,float>
class itk::MyFilter<class itk::Image<float,3>,double>
As you can see the conversion from double->float fails, while the float->float succeeds. @N-Dekker, what are your thoughts? How should we move forward?
I see, the dynamic_cast from SubFilter<itk::Image<float,3>,float>*
to MyFilter<itk::Image<float,3>,double>>*
fails, in ObjectFactory<itk::MyFilter<itk::Image<float,3>,double>>::Create()
. At https://github.com/InsightSoftwareConsortium/ITK/blob/1e003467e263d31bfb02589ce2449a413dbe4fa2/Modules/Core/Common/include/itkObjectFactory.h#L63
After some discussions with @N-Dekker, we agreed to have simply one test for GPUGenericMultiResolutionPyramid
instead of two i.e. to keep the one which covers how the filter is instantiated in the actual elastix code. In addition, some duplicate code was simplified in that test. Lastly, based on the discussion regarding the factories above , I added a commit that removes all GPUResampler factories that were not functional (=they were attempting without success to override double with float) and, because of that, the GPUFactories test had to be updated as well.
If there are no further comments/questions/discussion points, I think the PR can be merged. :)