FourierConvolutionCUDALib icon indicating copy to clipboard operation
FourierConvolutionCUDALib copied to clipboard

MVD fails with odd dimensions

Open tibuch opened this issue 9 years ago • 1 comments

Hello @StephanPreibisch

I did some MVD on a NVIDIA GeForce GTX Titan Black. If I set the block size in one or more dimensions to an odd number the deconvolved image is only black and white. This also happens if one or more dimensions of the cropped image are odd and the block size is set to "entire image at once".

This is a screen shot of the log and the deconvolved image: 513x513x513

If the MVD is computed on CPU it works with odd numbers.

Log output:

Using spimdata version: 0.9-revision
Using spimreconstruction version: 2.3.9
angles selected: 0, 90
channels selected: 0
illuminations selected: 0
Timepoints selected: 0
Fri Jul 17 13:00:39 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A000_stack.tif' ...
Fri Jul 17 13:00:39 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A000_stack.tif' [1024x1024x115 image=ArrayImg<UnsignedShortType>]
Fri Jul 17 13:00:40 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A090_stack.tif' ...
Fri Jul 17 13:00:41 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A090_stack.tif' [1024x1024x115 image=ArrayImg<UnsignedShortType>]
Fri Jul 17 13:00:42 CEST 2015: Estimating Bounding Box for Fusion. If size of images is not known (they were never opened before), some of them need to be opened once to determine their size.
Min: (356, 184, 80)
Max: (867, 695, 591)
Fri Jul 17 13:00:44 CEST 2015: Estimating Bounding Box for Fusion. If size of images is not known (they were never opened before), some of them need to be opened once to determine their size.
Looking for native libraries ending with '.dll' in directory: 'D:\Hannah\CUDA_fourierdeconvolution\Release' ... 
Trying to load following library: D:\Hannah\CUDA_fourierdeconvolution\Release\Convolution3D_fftCUDAlib.dll
Using an outdated version of the CUDA libs, cannot query free memory. Assuming total memory.
Using device GeForce GTX TITAN Black (id=0, mem=6144MB (6144MB free), CUDA capability 3.5)
beads --- channel: 0 angle: 0 illum: 0 timepoint: 0: 37 correspondences.
beads --- channel: 0 angle: 90 illum: 0 timepoint: 0: 37 correspondences.

Found 1 label(s) with correspondences for channel 0: 
Label 'beads' (channel 0) has 2/2 views with corresponding detections.
Channel 0: extract PSF from label 'beads'
Fusing view 0 of 1
(Fri Jul 17 13:01:07 CEST 2015): Reserving memory for fused & weight image.
Fri Jul 17 13:01:07 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A000_stack.tif' ...
Fri Jul 17 13:01:07 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A000_stack.tif' [1024x1024x115 image=ArrayImg<FloatType>]
Extracting PSF for viewsetup 0 using label 'beads' (37 corresponding detections available)
PSF size: (5, 5, 9)
Fusing view 1 of 1
(Fri Jul 17 13:01:24 CEST 2015): Reserving memory for fused & weight image.
Fri Jul 17 13:01:24 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A090_stack.tif' ...
Fri Jul 17 13:01:24 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A090_stack.tif' [1024x1024x115 image=ArrayImg<FloatType>]
Extracting PSF for viewsetup 1 using label 'beads' (37 corresponding detections available)
PSF size: (5, 5, 9)
Minimal number of overlapping views: 2, using 2
Average number of overlapping views: 2.0, using 2.0
Type of iteration: OPTIMIZATION_I
Number iterations: 10
OSEM speedup: 1.0
Using blocks: false
Using CUDA: true
Blending border: 2x2x1
Blending range: 25x25x25
PSF size (extracting): 5x5x9
Channel 0 extracts from label 'beads'. 
ImgLib container (deconvolved): ArrayImgFactory
Using Tikhonov regularization (lambda = 0.0060)
maxSize: (75, 7, 79)
Fri Jul 17 13:01:42 CEST 2015: numThreads = 16
Fri Jul 17 13:01:45 CEST 2015: done normalizing.
Min number of overlapping views: 2
Average number of overlapping views: 2.0
Average intensity in overlapping area: 0.01641087
OSEM acceleration: 1.0
Deconvolved image container: ArrayContainerFactory
iteration: 0 (Fri Jul 17 13:01:45 CEST 2015)
iteration: 0 --- sum change: NaN --- max change per pixel: NaN
iteration: 1 (Fri Jul 17 13:01:52 CEST 2015)
iteration: 1 --- sum change: NaN --- max change per pixel: NaN
iteration: 2 (Fri Jul 17 13:01:58 CEST 2015)
iteration: 2 --- sum change: NaN --- max change per pixel: NaN
iteration: 3 (Fri Jul 17 13:02:04 CEST 2015)
iteration: 3 --- sum change: NaN --- max change per pixel: NaN
iteration: 4 (Fri Jul 17 13:02:12 CEST 2015)
iteration: 4 --- sum change: NaN --- max change per pixel: NaN
iteration: 5 (Fri Jul 17 13:02:19 CEST 2015)
iteration: 5 --- sum change: NaN --- max change per pixel: NaN
iteration: 6 (Fri Jul 17 13:02:25 CEST 2015)
iteration: 6 --- sum change: NaN --- max change per pixel: NaN
iteration: 7 (Fri Jul 17 13:02:32 CEST 2015)
iteration: 7 --- sum change: NaN --- max change per pixel: NaN
iteration: 8 (Fri Jul 17 13:02:38 CEST 2015)
iteration: 8 --- sum change: NaN --- max change per pixel: NaN
iteration: 9 (Fri Jul 17 13:02:45 CEST 2015)
iteration: 9 --- sum change: NaN --- max change per pixel: NaN
DONE (Fri Jul 17 13:02:53 CEST 2015).

Console output:

fmax = 2.0
fmin = 0.0
imgSize (513, 513, 513)
kernelSize (5, 5, 79)
blockSize (517, 517, 591)
numBlocks (1, 1, 1)
effectiveSize (513, 513, 513)
effectiveLocalOffset (2, 2, 39)
imgSize (513, 513, 513)
kernelSize (75, 7, 7)
blockSize (587, 519, 519)
numBlocks (1, 1, 1)
effectiveSize (513, 513, 513)
effectiveLocalOffset (37, 3, 3)
 block 0(CPU  0): copy 150
 block 0(CUDA 0): compute 1060
 block 0(CPU  0): paste 40
0 a: -1400 ms.
0 b: -1430 ms.
 block 0(CPU  0): copy 175
 block 0(CUDA 0): compute 1085
 block 0(CPU  0): paste 45
1 a: -1465 ms.
1 b: -1565 ms.
 block 0(CPU  0): copy 165
 block 0(CUDA 0): compute 1015
 block 0(CPU  0): paste 45
0 a: -1390 ms.
0 b: -1300 ms.
 block 0(CPU  0): copy 175
 block 0(CUDA 0): compute 950
 block 0(CPU  0): paste 45
1 a: -1265 ms.
1 b: -1325 ms.
 block 0(CPU  0): copy 160
 block 0(CUDA 0): compute 965
 block 0(CPU  0): paste 45
0 a: -1305 ms.
0 b: -1265 ms.
 block 0(CPU  0): copy 185
 block 0(CUDA 0): compute 995
 block 0(CPU  0): paste 40
1 a: -1375 ms.
1 b: -1415 ms.
 block 0(CPU  0): copy 150
 block 0(CUDA 0): compute 1080
 block 0(CPU  0): paste 40
0 a: -1455 ms.
0 b: -1355 ms.
 block 0(CPU  0): copy 200
 block 0(CUDA 0): compute 1000
 block 0(CPU  0): paste 40
1 a: -3100 ms.
1 b: -1310 ms.
 block 0(CPU  0): copy 175
 block 0(CUDA 0): compute 960
 block 0(CPU  0): paste 45
0 a: -1365 ms.
0 b: -1420 ms.
 block 0(CPU  0): copy 180
 block 0(CUDA 0): compute 1075
 block 0(CPU  0): paste 45
1 a: -1410 ms.
1 b: -1495 ms.
 block 0(CPU  0): copy 180
 block 0(CUDA 0): compute 1060
 block 0(CPU  0): paste 40
0 a: -1425 ms.
0 b: -1435 ms.
 block 0(CPU  0): copy 175
 block 0(CUDA 0): compute 1145
 block 0(CPU  0): paste 45
1 a: -1470 ms.
1 b: -1435 ms.
 block 0(CPU  0): copy 160
 block 0(CUDA 0): compute 965
 block 0(CPU  0): paste 45
0 a: -1315 ms.
0 b: -1315 ms.
 block 0(CPU  0): copy 195
 block 0(CUDA 0): compute 950
 block 0(CPU  0): paste 45
1 a: -1950 ms.
1 b: -1345 ms.
 block 0(CPU  0): copy 160
 block 0(CUDA 0): compute 920
 block 0(CPU  0): paste 40
0 a: -1270 ms.
0 b: -1215 ms.
 block 0(CPU  0): copy 165
 block 0(CUDA 0): compute 975
 block 0(CPU  0): paste 45
1 a: -1305 ms.
1 b: -1255 ms.
 block 0(CPU  0): copy 145
 block 0(CUDA 0): compute 1085
 block 0(CPU  0): paste 40
0 a: -1455 ms.
0 b: -1370 ms.
 block 0(CPU  0): copy 170
 block 0(CUDA 0): compute 1155
 block 0(CPU  0): paste 45
1 a: -1515 ms.
1 b: -1405 ms.
 block 0(CPU  0): copy 155
 block 0(CUDA 0): compute 930
 block 0(CPU  0): paste 45
0 a: -1275 ms.
0 b: -1320 ms.
 block 0(CPU  0): copy 195
 block 0(CUDA 0): compute 1035
 block 0(CPU  0): paste 45
1 a: -3180 ms.
1 b: -1410 ms.
fmax = NaN
fmin = NaN

Software: Fiji continuous release

OS: Windows 7 64bit

Hardware: NVIDIA GeForce GTX Titan black 64GB RAM Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz

Is this a bug or are my settings wrong?

Thank you!

tibuch avatar Jul 17 '15 14:07 tibuch

Hi - interesting, can you share your input data, so I could reproduce the problem? I am not sure if it comes from the Java side or the native CUDA side. Best -

On 07/17/2015 04:52 PM, tibuch wrote:

Hello @StephanPreibisch

I did some MVD on a NVIDIA GeForce GTX Titan Black. If I set the block size in one or more dimensions to an odd number the deconvolved image is only black and white. This also happens if one or more dimensions of the cropped image are odd and the block size is set to "entire image at once".

This is a screen shot of the log and the deconvolved image: 513x513x513

If the MVD is computed on CPU it works with odd numbers.

Log output:

Using spimdata version: 0.9-revision
Using spimreconstruction version: 2.3.9
angles selected: 0, 90
channels selected: 0
illuminations selected: 0
Timepoints selected: 0
Fri Jul 17 13:00:39 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A000_stack.tif' ...
Fri Jul 17 13:00:39 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A000_stack.tif' [1024x1024x115 image=ArrayImg<UnsignedShortType>]
Fri Jul 17 13:00:40 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A090_stack.tif' ...
Fri Jul 17 13:00:41 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A090_stack.tif' [1024x1024x115 image=ArrayImg<UnsignedShortType>]
Fri Jul 17 13:00:42 CEST 2015: Estimating Bounding Box for Fusion. If size of images is not known (they were never opened before), some of them need to be opened once to determine their size.
Min: (356, 184, 80)
Max: (867, 695, 591)
Fri Jul 17 13:00:44 CEST 2015: Estimating Bounding Box for Fusion. If size of images is not known (they were never opened before), some of them need to be opened once to determine their size.
Looking for native libraries ending with '.dll' in directory: 'D:\Hannah\CUDA_fourierdeconvolution\Release' ...
Trying to load following library: D:\Hannah\CUDA_fourierdeconvolution\Release\Convolution3D_fftCUDAlib.dll
Using an outdated version of the CUDA libs, cannot query free memory. Assuming total memory.
Using device GeForce GTX TITAN Black (id=0, mem=6144MB (6144MB free), CUDA capability 3.5)
beads --- channel: 0 angle: 0 illum: 0 timepoint: 0: 37 correspondences.
beads --- channel: 0 angle: 90 illum: 0 timepoint: 0: 37 correspondences.

Found 1 label(s) with correspondences for channel 0:
Label 'beads' (channel 0) has 2/2 views with corresponding detections.
Channel 0: extract PSF from label 'beads'
Fusing view 0 of 1
(Fri Jul 17 13:01:07 CEST 2015): Reserving memory for fused & weight image.
Fri Jul 17 13:01:07 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A000_stack.tif' ...
Fri Jul 17 13:01:07 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A000_stack.tif' [1024x1024x115 image=ArrayImg<FloatType>]
Extracting PSF for viewsetup 0 using label 'beads' (37 corresponding detections available)
PSF size: (5, 5, 9)
Fusing view 1 of 1
(Fri Jul 17 13:01:24 CEST 2015): Reserving memory for fused & weight image.
Fri Jul 17 13:01:24 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A090_stack.tif' ...
Fri Jul 17 13:01:24 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A090_stack.tif' [1024x1024x115 image=ArrayImg<FloatType>]
Extracting PSF for viewsetup 1 using label 'beads' (37 corresponding detections available)
PSF size: (5, 5, 9)
Minimal number of overlapping views: 2, using 2
Average number of overlapping views: 2.0, using 2.0
Type of iteration: OPTIMIZATION_I
Number iterations: 10
OSEM speedup: 1.0
Using blocks: false
Using CUDA: true
Blending border: 2x2x1
Blending range: 25x25x25
PSF size (extracting): 5x5x9
Channel 0 extracts from label 'beads'.
ImgLib container (deconvolved): ArrayImgFactory
Using Tikhonov regularization (lambda = 0.0060)
maxSize: (75, 7, 79)
Fri Jul 17 13:01:42 CEST 2015: numThreads = 16
Fri Jul 17 13:01:45 CEST 2015: done normalizing.
Min number of overlapping views: 2
Average number of overlapping views: 2.0
Average intensity in overlapping area: 0.01641087
OSEM acceleration: 1.0
Deconvolved image container: ArrayContainerFactory
iteration: 0 (Fri Jul 17 13:01:45 CEST 2015)
iteration: 0 --- sum change: NaN --- max change per pixel: NaN
iteration: 1 (Fri Jul 17 13:01:52 CEST 2015)
iteration: 1 --- sum change: NaN --- max change per pixel: NaN
iteration: 2 (Fri Jul 17 13:01:58 CEST 2015)
iteration: 2 --- sum change: NaN --- max change per pixel: NaN
iteration: 3 (Fri Jul 17 13:02:04 CEST 2015)
iteration: 3 --- sum change: NaN --- max change per pixel: NaN
iteration: 4 (Fri Jul 17 13:02:12 CEST 2015)
iteration: 4 --- sum change: NaN --- max change per pixel: NaN
iteration: 5 (Fri Jul 17 13:02:19 CEST 2015)
iteration: 5 --- sum change: NaN --- max change per pixel: NaN
iteration: 6 (Fri Jul 17 13:02:25 CEST 2015)
iteration: 6 --- sum change: NaN --- max change per pixel: NaN
iteration: 7 (Fri Jul 17 13:02:32 CEST 2015)
iteration: 7 --- sum change: NaN --- max change per pixel: NaN
iteration: 8 (Fri Jul 17 13:02:38 CEST 2015)
iteration: 8 --- sum change: NaN --- max change per pixel: NaN
iteration: 9 (Fri Jul 17 13:02:45 CEST 2015)
iteration: 9 --- sum change: NaN --- max change per pixel: NaN
DONE (Fri Jul 17 13:02:53 CEST 2015).

Console output:

fmax = 2.0
fmin = 0.0
imgSize (513, 513, 513)
kernelSize (5, 5, 79)
blockSize (517, 517, 591)
numBlocks (1, 1, 1)
effectiveSize (513, 513, 513)
effectiveLocalOffset (2, 2, 39)
imgSize (513, 513, 513)
kernelSize (75, 7, 7)
blockSize (587, 519, 519)
numBlocks (1, 1, 1)
effectiveSize (513, 513, 513)
effectiveLocalOffset (37, 3, 3)
  block 0(CPU  0): copy 150
  block 0(CUDA 0): compute 1060
  block 0(CPU  0): paste 40
0 a: -1400 ms.
0 b: -1430 ms.
  block 0(CPU  0): copy 175
  block 0(CUDA 0): compute 1085
  block 0(CPU  0): paste 45
1 a: -1465 ms.
1 b: -1565 ms.
  block 0(CPU  0): copy 165
  block 0(CUDA 0): compute 1015
  block 0(CPU  0): paste 45
0 a: -1390 ms.
0 b: -1300 ms.
  block 0(CPU  0): copy 175
  block 0(CUDA 0): compute 950
  block 0(CPU  0): paste 45
1 a: -1265 ms.
1 b: -1325 ms.
  block 0(CPU  0): copy 160
  block 0(CUDA 0): compute 965
  block 0(CPU  0): paste 45
0 a: -1305 ms.
0 b: -1265 ms.
  block 0(CPU  0): copy 185
  block 0(CUDA 0): compute 995
  block 0(CPU  0): paste 40
1 a: -1375 ms.
1 b: -1415 ms.
  block 0(CPU  0): copy 150
  block 0(CUDA 0): compute 1080
  block 0(CPU  0): paste 40
0 a: -1455 ms.
0 b: -1355 ms.
  block 0(CPU  0): copy 200
  block 0(CUDA 0): compute 1000
  block 0(CPU  0): paste 40
1 a: -3100 ms.
1 b: -1310 ms.
  block 0(CPU  0): copy 175
  block 0(CUDA 0): compute 960
  block 0(CPU  0): paste 45
0 a: -1365 ms.
0 b: -1420 ms.
  block 0(CPU  0): copy 180
  block 0(CUDA 0): compute 1075
  block 0(CPU  0): paste 45
1 a: -1410 ms.
1 b: -1495 ms.
  block 0(CPU  0): copy 180
  block 0(CUDA 0): compute 1060
  block 0(CPU  0): paste 40
0 a: -1425 ms.
0 b: -1435 ms.
  block 0(CPU  0): copy 175
  block 0(CUDA 0): compute 1145
  block 0(CPU  0): paste 45
1 a: -1470 ms.
1 b: -1435 ms.
  block 0(CPU  0): copy 160
  block 0(CUDA 0): compute 965
  block 0(CPU  0): paste 45
0 a: -1315 ms.
0 b: -1315 ms.
  block 0(CPU  0): copy 195
  block 0(CUDA 0): compute 950
  block 0(CPU  0): paste 45
1 a: -1950 ms.
1 b: -1345 ms.
  block 0(CPU  0): copy 160
  block 0(CUDA 0): compute 920
  block 0(CPU  0): paste 40
0 a: -1270 ms.
0 b: -1215 ms.
  block 0(CPU  0): copy 165
  block 0(CUDA 0): compute 975
  block 0(CPU  0): paste 45
1 a: -1305 ms.
1 b: -1255 ms.
  block 0(CPU  0): copy 145
  block 0(CUDA 0): compute 1085
  block 0(CPU  0): paste 40
0 a: -1455 ms.
0 b: -1370 ms.
  block 0(CPU  0): copy 170
  block 0(CUDA 0): compute 1155
  block 0(CPU  0): paste 45
1 a: -1515 ms.
1 b: -1405 ms.
  block 0(CPU  0): copy 155
  block 0(CUDA 0): compute 930
  block 0(CPU  0): paste 45
0 a: -1275 ms.
0 b: -1320 ms.
  block 0(CPU  0): copy 195
  block 0(CUDA 0): compute 1035
  block 0(CPU  0): paste 45
1 a: -3180 ms.
1 b: -1410 ms.
fmax = NaN
fmin = NaN

Software: Fiji continuous release

OS: Windows 7 64bit

Hardware: NVIDIA GeForce GTX Titan black 64GB RAM Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz

Is this a bug or are my settings wrong?

Thank you!


Reply to this email directly or view it on GitHub: https://github.com/StephanPreibisch/FourierConvolutionCUDALib/issues/9

psteinb avatar Aug 05 '15 12:08 psteinb