ITKUltrasound
ITKUltrasound copied to clipboard
WIP: PERF: Use NeighborhoodRange for metric image computation
Depends on:
http://review.source.kitware.com/#/c/23795/4
Currently 4X slower
Closes #97
CC: @N-Dekker @phcerdan
Experimenting with the proposed neighborhood range constant pixel access policy. :microscope: Currently, this seems to be quite a bit slower, though. :snail:
Hey @thewtex, what script/test are you using for testing the performance?
@phcerdan itkBlockMatchingBayesianRegularizationDisplacementCalculatorTest
.
This use case might suggest adding ShapedImageNeighborhoodRange::SetLocation(const IndexType&)
, so that the ranges could be declared and initialized outside the loop, and then have their location updated inside the loop, by calling range.SetLocation(denomIt.GetIndex())
for each range.
Update: The member function is added with patch set 5: http://review.source.kitware.com/#/c/23795/5
Great ideas @N-Dekker ! :bulb:
And thanks for adding the SetLocation
patch to move reconstruction out of the loop. I will try that and manually expanding the inner product. This will probably happen next week.
On a related note, I was examining the patch and the existing code, and the construction of the proxy in operater*()
:
https://github.com/InsightSoftwareConsortium/ITK/blob/a5d40b033f809b8d54e0bda9bd1eb5e7613faf69/Modules/Core/Common/include/itkShapedImageNeighborhoodRange.h#L348-L351
Do you think the construction here has performance implications and would it be avoidable?
I was examining the patch and the existing code, and the construction of the proxy in
operater*()
: Do you think the construction here has performance implications and would it be avoidable?
The use of a proxy as return type of operater*()
is hard to avoid, for a constant-boundary iterator. Because operater*()
cannot return a 'real' reference to an existing pixel from the image, when trying to retrieve a pixel value outside the image boundaries.
However, looking at http://review.source.kitware.com/#/c/23795/5/Modules/Core/Common/include/itkConstantBoundaryImageNeighborhoodPixelAccessPolicy.h my intuition tells me that it might be possible to squeeze some CPU cycles out of the private helper function CalculatePixelIndexValue
, which is called with every operater*()
. CalculatePixelIndexValue
has an if statement and a return inside its for loop which might have performance implications... but I'm not sure! The compiler optimizer might be smarter than my intuition!
Update: With patch set 7 I adjusted the private helper function CalculatePixelIndexValue
, removing the return from inside its for loop, and indeed, it appears to significantly improve the performance! Using VS2017 64-bit Release, I observed a reduction of more than 30% of the run-time duration!
Experimenting with the proposed neighborhood range constant pixel access policy. 🔬 Currently, this seems to be quite a bit slower, though. 🐌
@thewtex Which compiler do you use? (Release build, optimized for speed?) I'm asking because I observed significant PERF differences between VS2015 and VS2017 (both 64-bit Release), while running the example code that I posted at https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/27
@N-Dekker this was Clang / MinSizeRel build. I will try other compilers and other build configurations, along with your example code!
@thewtex I'm interested to see your results! Using VS2017 Release, I found that NeighborhoodRange based iteration was almost 3x faster than the old-school itk::ConstNeighborhoodIterator
. Unfortunately, I have bad news... or is it good news?!? With the ConstNeighborhoodIterator patch http://review.source.kitware.com/#/c/23813/ that I just posted, the old-school iteration actually beats the NeighborhoodRange!!! At least when using VS2017 Release, on the example that I posted at https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/27 However, there are still some good reasons to consider NeighborhoodRange:
- The range class supports very fast change of location to an arbitrary position, especially when we add
ShapedImageNeighborhoodRange::SetLocation
(while the old itk::ConstNeighborhoodIterator::SetLocation is clearly "not optimized for speed"). - It is thread-safe (wheras the old itk::ConstNeighborhoodIterator has unsafe mutable data).
- It has a modern C++ interface.
Performance results discussed here:
https://discourse.itk.org/t/custom-border-extrapolation-of-shapedimageneighborhoodrange-by-imageneighborhoodpixelaccesspolicy/879/29