FourierConvolutionCUDALib
FourierConvolutionCUDALib copied to clipboard
MVD fails with odd dimensions
Hello @StephanPreibisch
I did some MVD on a NVIDIA GeForce GTX Titan Black. If I set the block size in one or more dimensions to an odd number the deconvolved image is only black and white. This also happens if one or more dimensions of the cropped image are odd and the block size is set to "entire image at once".
This is a screen shot of the log and the deconvolved image:
If the MVD is computed on CPU it works with odd numbers.
Log output:
Using spimdata version: 0.9-revision
Using spimreconstruction version: 2.3.9
angles selected: 0, 90
channels selected: 0
illuminations selected: 0
Timepoints selected: 0
Fri Jul 17 13:00:39 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A000_stack.tif' ...
Fri Jul 17 13:00:39 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A000_stack.tif' [1024x1024x115 image=ArrayImg<UnsignedShortType>]
Fri Jul 17 13:00:40 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A090_stack.tif' ...
Fri Jul 17 13:00:41 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A090_stack.tif' [1024x1024x115 image=ArrayImg<UnsignedShortType>]
Fri Jul 17 13:00:42 CEST 2015: Estimating Bounding Box for Fusion. If size of images is not known (they were never opened before), some of them need to be opened once to determine their size.
Min: (356, 184, 80)
Max: (867, 695, 591)
Fri Jul 17 13:00:44 CEST 2015: Estimating Bounding Box for Fusion. If size of images is not known (they were never opened before), some of them need to be opened once to determine their size.
Looking for native libraries ending with '.dll' in directory: 'D:\Hannah\CUDA_fourierdeconvolution\Release' ...
Trying to load following library: D:\Hannah\CUDA_fourierdeconvolution\Release\Convolution3D_fftCUDAlib.dll
Using an outdated version of the CUDA libs, cannot query free memory. Assuming total memory.
Using device GeForce GTX TITAN Black (id=0, mem=6144MB (6144MB free), CUDA capability 3.5)
beads --- channel: 0 angle: 0 illum: 0 timepoint: 0: 37 correspondences.
beads --- channel: 0 angle: 90 illum: 0 timepoint: 0: 37 correspondences.
Found 1 label(s) with correspondences for channel 0:
Label 'beads' (channel 0) has 2/2 views with corresponding detections.
Channel 0: extract PSF from label 'beads'
Fusing view 0 of 1
(Fri Jul 17 13:01:07 CEST 2015): Reserving memory for fused & weight image.
Fri Jul 17 13:01:07 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A000_stack.tif' ...
Fri Jul 17 13:01:07 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A000_stack.tif' [1024x1024x115 image=ArrayImg<FloatType>]
Extracting PSF for viewsetup 0 using label 'beads' (37 corresponding detections available)
PSF size: (5, 5, 9)
Fusing view 1 of 1
(Fri Jul 17 13:01:24 CEST 2015): Reserving memory for fused & weight image.
Fri Jul 17 13:01:24 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A090_stack.tif' ...
Fri Jul 17 13:01:24 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A090_stack.tif' [1024x1024x115 image=ArrayImg<FloatType>]
Extracting PSF for viewsetup 1 using label 'beads' (37 corresponding detections available)
PSF size: (5, 5, 9)
Minimal number of overlapping views: 2, using 2
Average number of overlapping views: 2.0, using 2.0
Type of iteration: OPTIMIZATION_I
Number iterations: 10
OSEM speedup: 1.0
Using blocks: false
Using CUDA: true
Blending border: 2x2x1
Blending range: 25x25x25
PSF size (extracting): 5x5x9
Channel 0 extracts from label 'beads'.
ImgLib container (deconvolved): ArrayImgFactory
Using Tikhonov regularization (lambda = 0.0060)
maxSize: (75, 7, 79)
Fri Jul 17 13:01:42 CEST 2015: numThreads = 16
Fri Jul 17 13:01:45 CEST 2015: done normalizing.
Min number of overlapping views: 2
Average number of overlapping views: 2.0
Average intensity in overlapping area: 0.01641087
OSEM acceleration: 1.0
Deconvolved image container: ArrayContainerFactory
iteration: 0 (Fri Jul 17 13:01:45 CEST 2015)
iteration: 0 --- sum change: NaN --- max change per pixel: NaN
iteration: 1 (Fri Jul 17 13:01:52 CEST 2015)
iteration: 1 --- sum change: NaN --- max change per pixel: NaN
iteration: 2 (Fri Jul 17 13:01:58 CEST 2015)
iteration: 2 --- sum change: NaN --- max change per pixel: NaN
iteration: 3 (Fri Jul 17 13:02:04 CEST 2015)
iteration: 3 --- sum change: NaN --- max change per pixel: NaN
iteration: 4 (Fri Jul 17 13:02:12 CEST 2015)
iteration: 4 --- sum change: NaN --- max change per pixel: NaN
iteration: 5 (Fri Jul 17 13:02:19 CEST 2015)
iteration: 5 --- sum change: NaN --- max change per pixel: NaN
iteration: 6 (Fri Jul 17 13:02:25 CEST 2015)
iteration: 6 --- sum change: NaN --- max change per pixel: NaN
iteration: 7 (Fri Jul 17 13:02:32 CEST 2015)
iteration: 7 --- sum change: NaN --- max change per pixel: NaN
iteration: 8 (Fri Jul 17 13:02:38 CEST 2015)
iteration: 8 --- sum change: NaN --- max change per pixel: NaN
iteration: 9 (Fri Jul 17 13:02:45 CEST 2015)
iteration: 9 --- sum change: NaN --- max change per pixel: NaN
DONE (Fri Jul 17 13:02:53 CEST 2015).
Console output:
fmax = 2.0
fmin = 0.0
imgSize (513, 513, 513)
kernelSize (5, 5, 79)
blockSize (517, 517, 591)
numBlocks (1, 1, 1)
effectiveSize (513, 513, 513)
effectiveLocalOffset (2, 2, 39)
imgSize (513, 513, 513)
kernelSize (75, 7, 7)
blockSize (587, 519, 519)
numBlocks (1, 1, 1)
effectiveSize (513, 513, 513)
effectiveLocalOffset (37, 3, 3)
block 0(CPU 0): copy 150
block 0(CUDA 0): compute 1060
block 0(CPU 0): paste 40
0 a: -1400 ms.
0 b: -1430 ms.
block 0(CPU 0): copy 175
block 0(CUDA 0): compute 1085
block 0(CPU 0): paste 45
1 a: -1465 ms.
1 b: -1565 ms.
block 0(CPU 0): copy 165
block 0(CUDA 0): compute 1015
block 0(CPU 0): paste 45
0 a: -1390 ms.
0 b: -1300 ms.
block 0(CPU 0): copy 175
block 0(CUDA 0): compute 950
block 0(CPU 0): paste 45
1 a: -1265 ms.
1 b: -1325 ms.
block 0(CPU 0): copy 160
block 0(CUDA 0): compute 965
block 0(CPU 0): paste 45
0 a: -1305 ms.
0 b: -1265 ms.
block 0(CPU 0): copy 185
block 0(CUDA 0): compute 995
block 0(CPU 0): paste 40
1 a: -1375 ms.
1 b: -1415 ms.
block 0(CPU 0): copy 150
block 0(CUDA 0): compute 1080
block 0(CPU 0): paste 40
0 a: -1455 ms.
0 b: -1355 ms.
block 0(CPU 0): copy 200
block 0(CUDA 0): compute 1000
block 0(CPU 0): paste 40
1 a: -3100 ms.
1 b: -1310 ms.
block 0(CPU 0): copy 175
block 0(CUDA 0): compute 960
block 0(CPU 0): paste 45
0 a: -1365 ms.
0 b: -1420 ms.
block 0(CPU 0): copy 180
block 0(CUDA 0): compute 1075
block 0(CPU 0): paste 45
1 a: -1410 ms.
1 b: -1495 ms.
block 0(CPU 0): copy 180
block 0(CUDA 0): compute 1060
block 0(CPU 0): paste 40
0 a: -1425 ms.
0 b: -1435 ms.
block 0(CPU 0): copy 175
block 0(CUDA 0): compute 1145
block 0(CPU 0): paste 45
1 a: -1470 ms.
1 b: -1435 ms.
block 0(CPU 0): copy 160
block 0(CUDA 0): compute 965
block 0(CPU 0): paste 45
0 a: -1315 ms.
0 b: -1315 ms.
block 0(CPU 0): copy 195
block 0(CUDA 0): compute 950
block 0(CPU 0): paste 45
1 a: -1950 ms.
1 b: -1345 ms.
block 0(CPU 0): copy 160
block 0(CUDA 0): compute 920
block 0(CPU 0): paste 40
0 a: -1270 ms.
0 b: -1215 ms.
block 0(CPU 0): copy 165
block 0(CUDA 0): compute 975
block 0(CPU 0): paste 45
1 a: -1305 ms.
1 b: -1255 ms.
block 0(CPU 0): copy 145
block 0(CUDA 0): compute 1085
block 0(CPU 0): paste 40
0 a: -1455 ms.
0 b: -1370 ms.
block 0(CPU 0): copy 170
block 0(CUDA 0): compute 1155
block 0(CPU 0): paste 45
1 a: -1515 ms.
1 b: -1405 ms.
block 0(CPU 0): copy 155
block 0(CUDA 0): compute 930
block 0(CPU 0): paste 45
0 a: -1275 ms.
0 b: -1320 ms.
block 0(CPU 0): copy 195
block 0(CUDA 0): compute 1035
block 0(CPU 0): paste 45
1 a: -3180 ms.
1 b: -1410 ms.
fmax = NaN
fmin = NaN
Software: Fiji continuous release
OS: Windows 7 64bit
Hardware: NVIDIA GeForce GTX Titan black 64GB RAM Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz
Is this a bug or are my settings wrong?
Thank you!
Hi - interesting, can you share your input data, so I could reproduce the problem? I am not sure if it comes from the Java side or the native CUDA side. Best -
On 07/17/2015 04:52 PM, tibuch wrote:
Hello @StephanPreibisch
I did some MVD on a NVIDIA GeForce GTX Titan Black. If I set the block size in one or more dimensions to an odd number the deconvolved image is only black and white. This also happens if one or more dimensions of the cropped image are odd and the block size is set to "entire image at once".
This is a screen shot of the log and the deconvolved image:
If the MVD is computed on CPU it works with odd numbers.
Log output:
Using spimdata version: 0.9-revision Using spimreconstruction version: 2.3.9 angles selected: 0, 90 channels selected: 0 illuminations selected: 0 Timepoints selected: 0 Fri Jul 17 13:00:39 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A000_stack.tif' ... Fri Jul 17 13:00:39 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A000_stack.tif' [1024x1024x115 image=ArrayImg<UnsignedShortType>] Fri Jul 17 13:00:40 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A090_stack.tif' ... Fri Jul 17 13:00:41 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A090_stack.tif' [1024x1024x115 image=ArrayImg<UnsignedShortType>] Fri Jul 17 13:00:42 CEST 2015: Estimating Bounding Box for Fusion. If size of images is not known (they were never opened before), some of them need to be opened once to determine their size. Min: (356, 184, 80) Max: (867, 695, 591) Fri Jul 17 13:00:44 CEST 2015: Estimating Bounding Box for Fusion. If size of images is not known (they were never opened before), some of them need to be opened once to determine their size. Looking for native libraries ending with '.dll' in directory: 'D:\Hannah\CUDA_fourierdeconvolution\Release' ... Trying to load following library: D:\Hannah\CUDA_fourierdeconvolution\Release\Convolution3D_fftCUDAlib.dll Using an outdated version of the CUDA libs, cannot query free memory. Assuming total memory. Using device GeForce GTX TITAN Black (id=0, mem=6144MB (6144MB free), CUDA capability 3.5) beads --- channel: 0 angle: 0 illum: 0 timepoint: 0: 37 correspondences. beads --- channel: 0 angle: 90 illum: 0 timepoint: 0: 37 correspondences. Found 1 label(s) with correspondences for channel 0: Label 'beads' (channel 0) has 2/2 views with corresponding detections. Channel 0: extract PSF from label 'beads' Fusing view 0 of 1 (Fri Jul 17 13:01:07 CEST 2015): Reserving memory for fused & weight image. Fri Jul 17 13:01:07 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A000_stack.tif' ... Fri Jul 17 13:01:07 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A000_stack.tif' [1024x1024x115 image=ArrayImg<FloatType>] Extracting PSF for viewsetup 0 using label 'beads' (37 corresponding detections available) PSF size: (5, 5, 9) Fusing view 1 of 1 (Fri Jul 17 13:01:24 CEST 2015): Reserving memory for fused & weight image. Fri Jul 17 13:01:24 CEST 2015: Loading 'D:\Hannah\gpu test\.\.\A090_stack.tif' ... Fri Jul 17 13:01:24 CEST 2015: Opened 'D:\Hannah\gpu test\.\.\A090_stack.tif' [1024x1024x115 image=ArrayImg<FloatType>] Extracting PSF for viewsetup 1 using label 'beads' (37 corresponding detections available) PSF size: (5, 5, 9) Minimal number of overlapping views: 2, using 2 Average number of overlapping views: 2.0, using 2.0 Type of iteration: OPTIMIZATION_I Number iterations: 10 OSEM speedup: 1.0 Using blocks: false Using CUDA: true Blending border: 2x2x1 Blending range: 25x25x25 PSF size (extracting): 5x5x9 Channel 0 extracts from label 'beads'. ImgLib container (deconvolved): ArrayImgFactory Using Tikhonov regularization (lambda = 0.0060) maxSize: (75, 7, 79) Fri Jul 17 13:01:42 CEST 2015: numThreads = 16 Fri Jul 17 13:01:45 CEST 2015: done normalizing. Min number of overlapping views: 2 Average number of overlapping views: 2.0 Average intensity in overlapping area: 0.01641087 OSEM acceleration: 1.0 Deconvolved image container: ArrayContainerFactory iteration: 0 (Fri Jul 17 13:01:45 CEST 2015) iteration: 0 --- sum change: NaN --- max change per pixel: NaN iteration: 1 (Fri Jul 17 13:01:52 CEST 2015) iteration: 1 --- sum change: NaN --- max change per pixel: NaN iteration: 2 (Fri Jul 17 13:01:58 CEST 2015) iteration: 2 --- sum change: NaN --- max change per pixel: NaN iteration: 3 (Fri Jul 17 13:02:04 CEST 2015) iteration: 3 --- sum change: NaN --- max change per pixel: NaN iteration: 4 (Fri Jul 17 13:02:12 CEST 2015) iteration: 4 --- sum change: NaN --- max change per pixel: NaN iteration: 5 (Fri Jul 17 13:02:19 CEST 2015) iteration: 5 --- sum change: NaN --- max change per pixel: NaN iteration: 6 (Fri Jul 17 13:02:25 CEST 2015) iteration: 6 --- sum change: NaN --- max change per pixel: NaN iteration: 7 (Fri Jul 17 13:02:32 CEST 2015) iteration: 7 --- sum change: NaN --- max change per pixel: NaN iteration: 8 (Fri Jul 17 13:02:38 CEST 2015) iteration: 8 --- sum change: NaN --- max change per pixel: NaN iteration: 9 (Fri Jul 17 13:02:45 CEST 2015) iteration: 9 --- sum change: NaN --- max change per pixel: NaN DONE (Fri Jul 17 13:02:53 CEST 2015).
Console output:
fmax = 2.0 fmin = 0.0 imgSize (513, 513, 513) kernelSize (5, 5, 79) blockSize (517, 517, 591) numBlocks (1, 1, 1) effectiveSize (513, 513, 513) effectiveLocalOffset (2, 2, 39) imgSize (513, 513, 513) kernelSize (75, 7, 7) blockSize (587, 519, 519) numBlocks (1, 1, 1) effectiveSize (513, 513, 513) effectiveLocalOffset (37, 3, 3) block 0(CPU 0): copy 150 block 0(CUDA 0): compute 1060 block 0(CPU 0): paste 40 0 a: -1400 ms. 0 b: -1430 ms. block 0(CPU 0): copy 175 block 0(CUDA 0): compute 1085 block 0(CPU 0): paste 45 1 a: -1465 ms. 1 b: -1565 ms. block 0(CPU 0): copy 165 block 0(CUDA 0): compute 1015 block 0(CPU 0): paste 45 0 a: -1390 ms. 0 b: -1300 ms. block 0(CPU 0): copy 175 block 0(CUDA 0): compute 950 block 0(CPU 0): paste 45 1 a: -1265 ms. 1 b: -1325 ms. block 0(CPU 0): copy 160 block 0(CUDA 0): compute 965 block 0(CPU 0): paste 45 0 a: -1305 ms. 0 b: -1265 ms. block 0(CPU 0): copy 185 block 0(CUDA 0): compute 995 block 0(CPU 0): paste 40 1 a: -1375 ms. 1 b: -1415 ms. block 0(CPU 0): copy 150 block 0(CUDA 0): compute 1080 block 0(CPU 0): paste 40 0 a: -1455 ms. 0 b: -1355 ms. block 0(CPU 0): copy 200 block 0(CUDA 0): compute 1000 block 0(CPU 0): paste 40 1 a: -3100 ms. 1 b: -1310 ms. block 0(CPU 0): copy 175 block 0(CUDA 0): compute 960 block 0(CPU 0): paste 45 0 a: -1365 ms. 0 b: -1420 ms. block 0(CPU 0): copy 180 block 0(CUDA 0): compute 1075 block 0(CPU 0): paste 45 1 a: -1410 ms. 1 b: -1495 ms. block 0(CPU 0): copy 180 block 0(CUDA 0): compute 1060 block 0(CPU 0): paste 40 0 a: -1425 ms. 0 b: -1435 ms. block 0(CPU 0): copy 175 block 0(CUDA 0): compute 1145 block 0(CPU 0): paste 45 1 a: -1470 ms. 1 b: -1435 ms. block 0(CPU 0): copy 160 block 0(CUDA 0): compute 965 block 0(CPU 0): paste 45 0 a: -1315 ms. 0 b: -1315 ms. block 0(CPU 0): copy 195 block 0(CUDA 0): compute 950 block 0(CPU 0): paste 45 1 a: -1950 ms. 1 b: -1345 ms. block 0(CPU 0): copy 160 block 0(CUDA 0): compute 920 block 0(CPU 0): paste 40 0 a: -1270 ms. 0 b: -1215 ms. block 0(CPU 0): copy 165 block 0(CUDA 0): compute 975 block 0(CPU 0): paste 45 1 a: -1305 ms. 1 b: -1255 ms. block 0(CPU 0): copy 145 block 0(CUDA 0): compute 1085 block 0(CPU 0): paste 40 0 a: -1455 ms. 0 b: -1370 ms. block 0(CPU 0): copy 170 block 0(CUDA 0): compute 1155 block 0(CPU 0): paste 45 1 a: -1515 ms. 1 b: -1405 ms. block 0(CPU 0): copy 155 block 0(CUDA 0): compute 930 block 0(CPU 0): paste 45 0 a: -1275 ms. 0 b: -1320 ms. block 0(CPU 0): copy 195 block 0(CUDA 0): compute 1035 block 0(CPU 0): paste 45 1 a: -3180 ms. 1 b: -1410 ms. fmax = NaN fmin = NaN
Software: Fiji continuous release
OS: Windows 7 64bit
Hardware: NVIDIA GeForce GTX Titan black 64GB RAM Intel(R) Xeon(R) CPU E5-2687W v2 @ 3.40GHz
Is this a bug or are my settings wrong?
Thank you!
Reply to this email directly or view it on GitHub: https://github.com/StephanPreibisch/FourierConvolutionCUDALib/issues/9