Compute-continuous-moments-de-ned-in-a-rectangular-region-using-CUDA-and-some-applications
Compute-continuous-moments-de-ned-in-a-rectangular-region-using-CUDA-and-some-applications copied to clipboard
I am rewriting the code in paper "A new fast algorithm to compute continuous moments defined in a rectangular region" using CUDA, and try to find some applications.
The main idea of image moments is to treat the image as a discrete two-dimensional function, and find a group of orthogonal basis functions to approach the discrete two-dimensional function. There are two important innovations: 1.Apply the nearest neighbor interpolation to calculate image moments accurately; 2.Apply matrix multiplication to accelerate the calculation.
We can go three steps further:
- Replace the nearest neighbor interpolation with bilinear interpolation in formula (31) to calculate the integration more accurately;
- Integrate the formula (47) and (48), where the rows of matrix B and C are the orders of Legendre polynomials;
- When reconstructing the image, change the formula (47) into subdiagonal triangular matrix, and apply matrix multiplication. The formula (47) would be like $$\left[\begin{matrix} \lambda_0,_0 & \lambda_0,1 & ... & \lambda_0,{n-1} & \lambda_0,_n \ \lambda_1,_0 & \lambda_1,1 & ... & \lambda_1,{n-1} & 0 \ \lambda_2,_0 & \lambda_2,_1 & ... & 0 & 0 \ . & . & ... & . & . \ . & . & ... & . & . \ \lambda_m,_0 & 0 & ... & 0 & 0 \ \end{matrix}\right].$$
1.Some image related applications:
-
Image reconstruction. When the K is 11 and order is 1024, the MSE is 1.8 and PSNR is 45.5
bash run.sh
./test_images/lenna-1024_11_1024_recons.jpg is the image reconstructions result.
Runnvprof ./build/reconstruction
to see the kernel time.MSE: 1.80595 PSNR: 45.5637 ==760574== Profiling application: ./build/reconstruction ==760574== Profiling result: Type Time(%) Time Calls Avg Min Max Name GPU activities: 88.86% 78.241ms 2 39.121ms 6.6041ms 71.637ms volta_sgemm_128x128_nn 7.42% 6.5349ms 1 6.5349ms 6.5349ms 6.5349ms inter_liner_k(float*, unsigned char*, int, int, int, int, float) 1.62% 1.4307ms 2 715.35us 714.99us 715.72us volta_sgemm_128x64_nn 0.84% 736.59us 4 184.15us 72.894us 297.53us p_polynomial(float*, int, float, float) 0.75% 664.08us 2 332.04us 55.070us 609.01us matrix_transpose(float*, float const *, int, int) 0.21% 182.08us 1 182.08us 182.08us 182.08us [CUDA memcpy DtoH] 0.19% 167.64us 5 33.528us 1.1520us 162.78us [CUDA memcpy HtoD] 0.05% 46.591us 1 46.591us 46.591us 46.591us recon_img_f2u(unsigned char*, float*, int, int) 0.05% 42.943us 1 42.943us 42.943us 42.943us multiply_coff(float*, float*, int, int) 0.00% 1.8560us 2 928ns 832ns 1.0240us [CUDA memset]
-
Extract image edge information. According to the Cauchy Convergence Theorem,
for any ε>0, there exists N, such that M~1~, M~2~ > N, $\lvert f_{M_1}(x,y)-f_{M_2}(x,y) \rvert$<ε.
Assume M~1~ > M~2~, $f_{M_1}(x,y)-f_{M_2}(x,y)$ contains very small information of the original image, which is the details or edges of image. Simon Liao also mentioned that in Image analysis by orthogonal Fourier-Mellin moments[2013].
Runnvprof ./build/edge_extraction
to see the kernel time.==132176== NVPROF is profiling process 132176, command: ./build/edge_extraction cudaEvent_t time: 302131 us ==132176== Profiling application: ./build/edge_extraction ==132176== Profiling result: Type Time(%) Time Calls Avg Min Max Name GPU activities: 66.33% 4.0668ms 3 1.3556ms 233.56us 3.4446ms volta_sgemm_128x64_nn 13.72% 841.32us 1 841.32us 841.32us 841.32us volta_sgemm_128x32_sliced1x4_nn 9.13% 559.73us 1 559.73us 559.73us 559.73us inter_liner_k(float*, unsigned char*, int, int, int, int, float) 2.72% 166.72us 5 33.343us 1.0240us 162.33us [CUDA memcpy HtoD] 2.61% 159.84us 1 159.84us 159.84us 159.84us [CUDA memcpy DtoH] 2.41% 147.84us 4 36.959us 31.648us 40.959us p_polynomial(float*, int, float, float) 2.03% 124.22us 2 62.110us 32.895us 91.325us matrix_transpose(float*, float const *, int, int) 0.80% 49.119us 1 49.119us 49.119us 49.119us recon_img_f2u(unsigned char*, float*, int, int) 0.22% 13.695us 1 13.695us 13.695us 13.695us multiply_coff(float*, float*, int, int) 0.03% 1.9840us 2 992ns 864ns 1.1200us [CUDA memset]
// TODO
3. Calaulate image gradient in X and Y direction. The basis functions is uniformly continuous, we can exchange the order of integral and summation in formula (6) and (7), calculate the partial derivative in X and Y direction. The f_{n}'(x,y) is not Uniform Convergence, that does not work.
4. Image resize. When restructing the image, divide the [-1, 1] into different pieces to get a bigger or smaller image.
2.Point cloud related applications:
Point cloud is a discrete three-dimensional function, we can get similar results as image.
- point cloud compress.
- Extract point cloud edge information.
- Calculate the gradient in X, Y and Z direction of each point. This may be important in self-driving. For example, we can detects obstacles, uphills, downhills, speed bumps, curbs, puddles, etc. to get a drivable area by calculating the gradient in Z direction. Same as the image, that does not work.
- Make point cloud dense or sparse. We can get the point values in fixed positions, and turn a frame of messy point clouds into a regular sparse matrix.