niftyreg icon indicating copy to clipboard operation
niftyreg copied to clipboard

Rearchitect F3D to reinstate GPU

Open onurulgen opened this issue 2 years ago • 10 comments

onurulgen avatar Nov 11 '22 11:11 onurulgen

@onurulgen As mentioned earlier - I am finally able to allocate some time to NiftyReg.

Looking at differences between this branch and master, it appears some discrepancies happen when using lncc as a measure of similarity between the two branches. I will add relevant tests to pinpoint the potential error(s).

mmodat avatar Jun 21 '23 13:06 mmodat

@onurulgen When you have a chance, could you go through the list of what still needs to be done before we finalise this branch and merge it back? Please add it as a list of tasks and we can get cracking.

mmodat avatar Jul 12 '23 08:07 mmodat

@onurulgen As mentioned earlier - I am finally able to allocate some time to NiftyReg.

Looking at differences between this branch and master, it appears some discrepancies happen when using lncc as a measure of similarity between the two branches. I will add relevant tests to pinpoint the potential error(s).

Fixed in e6855af5d45634d6c35b913c1c1f746b61208039

onurulgen avatar Jul 12 '23 11:07 onurulgen

The symmetric scheme isn't supported for these classes:

  • [x] reg_nmi_gpu
  • [x] reg_ssd_gpu
  • [x] reg_optimiser_gpu
  • [x] reg_conjugateGradient_gpu

CUDA implementations needed:

  • [x] CudaCompute::GetDefFieldFromVelocityGrid()
  • [x] CudaCompute::ApproxLinearEnergy()
  • [x] CudaCompute::ApproxLinearEnergyGradient()
  • [x] CudaCompute::SmoothGradient()
  • [x] CudaCompute::ConvolveVoxelBasedMeasureGradient()
  • [x] CudaCompute::ExponentiateGradient()
  • [x] CudaCompute::SymmetriseVelocityFields()
  • [x] CudaCompute::UpdateVelocityField()
  • [ ] reg_lncc_gpu
  • [x] reg_optimiser_gpu::Perturbation()
  • [ ] CudaCompute::GetLandmarkDistance()
  • [ ] CudaCompute::LandmarkDistanceGradient()
  • [ ] CudaCompute::GetApproximatedGradient()
  • [ ] CudaCompute::BchUpdate()
  • [ ] reg_mindssc_gpu
  • [ ] reg_mind_gpu
  • [ ] reg_kld_gpu
  • [ ] reg_dti_gpu

Incomplete implementations:

  • [x] Fix reg_spline_getDeformationField_gpu to accept composition
  • [x] Fix reg_getImageGradient_gpu to accept interpolation and timepoints

onurulgen avatar Jul 13 '23 10:07 onurulgen

@onurulgen The block matching test added above fails in Release., but runs fine in Debug. There might be a trouble with CPU or CUDA block matching. I'll add some unit tests for this as well (but unlikely this week).

mmodat avatar Jul 14 '23 13:07 mmodat

Morning @onurulgen,

I was writing a test for the NMI gradient computation and found out that measures have to be initialised with a F3DContent. I think that NMI (and the other measures) should use the base Content instead. As we discussed last week, the measures computations do not need to know about the transformation model as all is based on a voxel-based approach. Would you be able to change this throughout? Happy to have a chat later today if this is not clear or you don't agree. Thanks

mmodat avatar Jul 26 '23 09:07 mmodat

We can't put into Content since it fills AladinContent with unnecessary stuff. I'll reorganise Content classes as

Content

onurulgen avatar Jul 26 '23 11:07 onurulgen

Morning @onurulgen,

I was writing a test for the NMI gradient computation and found out that measures have to be initialised with a F3DContent. I think that NMI (and the other measures) should use the base Content instead. As we discussed last week, the measures computations do not need to know about the transformation model as all is based on a voxel-based approach. Would you be able to change this throughout? Happy to have a chat later today if this is not clear or you don't agree. Thanks

Done!

onurulgen avatar Aug 03 '23 13:08 onurulgen

@onurulgen Not sure what the rationale of the latest change - so I reverted some part, documented the rational and checked that it gave the same output that 1 month ago.

I also ran a few tests, in short cross-registration across 10 images and thus 90 pairs (only finished 71 so far). For each I ran the master, the current CPU and the current GPU. See plot below: image

Times (std):
master=127.54166666666667 seconds (9.948391131791668)
cpu=133.31944444444446 seconds (15.094909359789044)
gpu=88.0 seconds (7.8828223935903114)
Values (std):
master=1.1980736111111112 (0.01530594514464343)
cpu=1.1974633333333333 (0.009458200043929492)
gpu=1.1981574999999998 (0.009576417214525837)
Average Dice (std):
master=0.5513964023433462 (0.022439108017167943)
cpu=0.5509514152109074 (0.02281113854459689)
gpu=0.5508319071958537 (0.022840715409729245)
Paired t-test times:
time master vs cpu TtestResult(statistic=-3.206822322398328, pvalue=0.0020134464080249436, df=71)
time cpu vs gpu TtestResult(statistic=26.17402476630229, pvalue=3.3581916812498626e-38, df=71)
time master vs gpu TtestResult(statistic=27.93304352596221, pvalue=4.962584143639594e-40, df=71)
Paired t-test values:
value master vs cpu TtestResult(statistic=1.7197269701738718, pvalue=0.0898387349019013, df=71)
value master vs gpu TtestResult(statistic=-0.2290983106625733, pvalue=0.8194511562370143, df=71)
value cpu vs gpu TtestResult(statistic=-2.0245956146751567, pvalue=0.04667132334622872, df=71)
Paired t-test Dice:
Dice master vs cpu TtestResult(statistic=1.1917984840400984, pvalue=0.23730876159415193, df=71)
Dice cpu vs gpu TtestResult(statistic=0.34667166294341434, pvalue=0.7298633632025944, df=71)
Dice master vs gpu TtestResult(statistic=1.5219082335295127, pvalue=0.1324736129155228, df=71)

Not sure when master and this branch CPU diverged - worth investigating.

mmodat avatar Sep 08 '23 12:09 mmodat

Thanks for lastest commit - will check and add tests.

mmodat avatar Sep 11 '23 09:09 mmodat