grass icon indicating copy to clipboard operation
grass copied to clipboard

v.surf.rst: Cross-validation OpenMP support

Open cyliang368 opened this issue 1 year ago • 9 comments

This finishes the implementation of OpenMP support for cross-validation (#2644). A critical block is added to segmen2d_parallel.c to make the writing to the database in serial. main.c is modified to allow parallel computing with nprocs>1. Two simple test scripts are used to run cross-validation with nprocs=1 and nprocs>1. The results of these tests are the same.

cyliang368 avatar Apr 13 '24 03:04 cyliang368

Compilation errors:

2024-04-15T17:35:03.7536227Z point2d.c: In function ‘IL_check_at_points_2d_cvdev’:
2024-04-15T17:35:03.7537488Z point2d.c:213:19: error: unused variable ‘mm’ [-Werror=unused-variable]
2024-04-15T17:35:03.7538241Z   213 |     int /* n1, */ mm, m;
2024-04-15T17:35:03.7538696Z       |                   ^~
2024-04-15T17:35:03.7539546Z point2d.c:210:12: error: unused variable ‘north’ [-Werror=unused-variable]
2024-04-15T17:35:03.7540477Z   210 |     double north = data->ymax;
2024-04-15T17:35:03.7541011Z       |            ^~~~~
2024-04-15T17:35:03.7541851Z point2d.c:208:12: error: unused variable ‘east’ [-Werror=unused-variable]
2024-04-15T17:35:03.7542805Z   208 |     double east = data->xmax;
2024-04-15T17:35:03.7543408Z       |            ^~~~
2024-04-15T17:35:03.7664436Z cc1: all warnings being treated as errors
2024-04-15T17:37:05.2813633Z main.c: In function ‘main’:
2024-04-15T17:37:05.2814492Z main.c:466:5: error: too few arguments to function ‘IL_init_func_2d’
2024-04-15T17:37:05.2815104Z   466 |     IL_init_func_2d(&params, IL_grid_calc_2d, IL_matrix_create,
2024-04-15T17:37:05.2815541Z       |     ^~~~~~~~~~~~~~~
2024-04-15T17:37:05.2815830Z In file included from main.c:43:
2024-04-15T17:37:05.2816586Z /home/runner/work/grass/grass/dist.x86_64-pc-linux-gnu/include/grass/interpf.h:153:6: note: declared here
2024-04-15T17:37:05.2817450Z   153 | void IL_init_func_2d(struct interp_params *, grid_calc_fn *, matrix_create_fn *,
2024-04-15T17:37:05.2817964Z       |      ^~~~~~~~~~~~~~~
2024-04-15T17:37:05.2846707Z make[4]: *** [../../include/Make/Compile.make:32: OBJ.x86_64-pc-linux-gnu/main.o] Error 1
2024-04-15T17:38:28.9886495Z main.c:660:12: error: comparison between pointer and integer [-Werror]
2024-04-15T17:38:28.9887414Z   660 |     if (cv != NULL || create_devi != NULL) {
2024-04-15T17:38:28.9888056Z       |            ^~
2024-04-15T17:38:28.9888873Z main.c:660:35: error: comparison between pointer and integer [-Werror]
2024-04-15T17:38:28.9889786Z   660 |     if (cv != NULL || create_devi != NULL) {
2024-04-15T17:38:28.9890443Z       |                                   ^~
2024-04-15T17:38:29.0160707Z cc1: all warnings being treated as errors
2024-04-15T17:38:29.0177415Z make[3]: *** [../../include/Make/Compile.make:32: OBJ.x86_64-pc-linux-gnu/main.o] Error 1

petrasovaa avatar Apr 15 '24 18:04 petrasovaa

@cyliang368 Good to see you working on this! May I recommend using pre-commit, which will take care of formatting and other useful checks locally.

nilason avatar Apr 15 '24 19:04 nilason

You may also turn on -Wall -Wextra -Wpedantic -Wvla compiler flags as well (those are checked on some of the CI runners), thus saving some CI time.

nilason avatar Apr 15 '24 19:04 nilason

@petrasovaa @nilason I didn't notice that warnings are treated as errors and that I should use pre-commit. Thanks for the advice!

cyliang368 avatar Apr 15 '24 19:04 cyliang368

I wouldn't consider a critical block parallel because it restricts execution of them to one thread at a time and performance won't be much different from serial. Have you checked speedup and efficiency? Any plots to prove performance improvement?

Ideally, it would be best to create a parallel version of lib/rst/interp_float/point2d.c just like lib/rst/interp_float/segment2d_parallel.c.

@HuidaeCho I split the IL_check_at_points_2d function into two functions. One is used for calculation, whereas the other is used to write values into the map. Now, when running cross-validation or creating deviation, the parallel for loop in lib/rst/interp_float/segment2d_parallel.c computes everything in parallel except for file writing.

With this implementation, we get higher speedup and efficiency (green lines in the figures below). I modified some functions, files, and the structure of loops for this implementation. Please review them. Thanks!

speedup

efficiency

cyliang368 avatar Apr 15 '24 20:04 cyliang368

Before this can be merged, we need a test, at least just testing the cross-validation results are the same as before (reference based on results before this change) and that they are the same for different number of cores. The test can be added to v.surf.rst/testsuite/test_vsurfrst.py.

petrasovaa avatar Apr 16 '24 14:04 petrasovaa

Before this can be merged, we need a test, at least just testing the cross-validation results are the same as before (reference based on results before this change) and that they are the same for different number of cores. The test can be added to v.surf.rst/testsuite/test_vsurfrst.py.

@petrasovaa I added tests for parallelized cross-validation and deviation-creation. I used the results from IL_interp_segments_2d, which I didn't change, to check the results from IL_interp_segments_2d_parallel. The results are consistent, no matter whether single thread or multi-threads is used. Please check https://github.com/OSGeo/grass/pull/3590/commits/d4b8da7122c31dda61f592a2166aecb608b0899e.

cyliang368 avatar Apr 17 '24 04:04 cyliang368

I wouldn't consider a critical block parallel because it restricts execution of them to one thread at a time and performance won't be much different from serial. Have you checked speedup and efficiency? Any plots to prove performance improvement? Ideally, it would be best to create a parallel version of lib/rst/interp_float/point2d.c just like lib/rst/interp_float/segment2d_parallel.c.

@HuidaeCho I split the IL_check_at_points_2d function into two functions. One is used for calculation, whereas the other is used to write values into the map. Now, when running cross-validation or creating deviation, the parallel for loop in lib/rst/interp_float/segment2d_parallel.c computes everything in parallel except for file writing.

With this implementation, we get higher speedup and efficiency (green lines in the figures below). I modified some functions, files, and the structure of loops for this implementation. Please review them. Thanks!

speedup

efficiency

Looks great! I would keep the original point2d.c and copy it to point2d_parallel.c just like segment2d_parallel.c for consistency and just in case. This PR now breaks backward compatibility: 1) adding new init functions (new arguments; IL_write_point_2d is new, but is it necessary?), 2) changing pass-by-value to pass-by-reference. Are these all necessary and can we avoid doing these?

The code for IL_write_point_2d repeats three times in point2d.c. Given the fact that it already repeats twice before this PR in point2d.c, I would say, we don't even need to add this new function and argument to IL_init_func_2d. Why and when would we pass another function than this code? Have you considered calling this function within the library from the three places without adding it as an argument to IL_init_func_2d?

HuidaeCho avatar Apr 18 '24 01:04 HuidaeCho

I wouldn't consider a critical block parallel because it restricts execution of them to one thread at a time and performance won't be much different from serial. Have you checked speedup and efficiency? Any plots to prove performance improvement? Ideally, it would be best to create a parallel version of lib/rst/interp_float/point2d.c just like lib/rst/interp_float/segment2d_parallel.c.

@HuidaeCho I split the IL_check_at_points_2d function into two functions. One is used for calculation, whereas the other is used to write values into the map. Now, when running cross-validation or creating deviation, the parallel for loop in lib/rst/interp_float/segment2d_parallel.c computes everything in parallel except for file writing. With this implementation, we get higher speedup and efficiency (green lines in the figures below). I modified some functions, files, and the structure of loops for this implementation. Please review them. Thanks! speedup efficiency

Looks great! I would keep the original point2d.c and copy it to point2d_parallel.c just like segment2d_parallel.c for consistency and just in case. This PR now breaks backward compatibility: 1) adding new init functions (new arguments; IL_write_point_2d is new, but is it necessary?), 2) changing pass-by-value to pass-by-reference. Are these all necessary and can we avoid doing these?

The code for IL_write_point_2d repeats three times in point2d.c. Given the fact that it already repeats twice before this PR in point2d.c, I would say, we don't even need to add this new function and argument to IL_init_func_2d. Why and when would we pass another function than this code? Have you considered calling this function within the library from the three places without adding it as an argument to IL_init_func_2d?

Thanks for reviewing my PR. These comments gave me some ideas on how to improve it. Based on these two questions, I pushed new commits, and I wrote down my thoughts here.

1. adding new init functions (new arguments; IL_write_point_2d is new, but is it necessary?

It is not necessary to add a new init function and argument. However, to make them clear, I had thought I shouldn't make a function do two tasks. IL_check_at_points_2d did two things: compute the deviations and write values to the database. Therefore, I split it into two functions.

In the new commit, I included IL_write_point_2d into IL_write_point_2d_parallel so that we don't need to change the definition and usage of IL_init_func_2d.

2. changing pass-by-value to pass-by-reference. Are these all necessary and can we avoid doing these?

This is necessary in my implementation. This question is related to the previous question. IL_check_at_points_2d function actually does two things. Computing should be parallelized while writing out should be in serial. I can only figure out three ways to save the outputs: (1) make this function return values, (2) pass-by-reference and save the values by this pointer, and (3) directly write them out to the database. Since we want to parallelize the function, we cannot use (3). (1) does not seem to be preferred, so I chose (2).

This is my first time contributing to open-source code, so I am unsure whether these are preferred practices. Please review them and give me advice. Thanks a lot.

cyliang368 avatar Apr 21 '24 19:04 cyliang368

In this case I am fine with breaking the backwards compatibility, the API is not widely used, so I don't expect any problems.

petrasovaa avatar Jun 17 '24 09:06 petrasovaa

(2) pass-by-reference and save the values by this pointer

@cyliang368 Does this still apply? I cannot find anywhere in IL_check_at_points_2d() that updates skip_point.

            xx = points[m - 1].x - skip_point->x;           
            yy = points[m - 1].y - skip_point->y;            
...
        zz = skip_point->z + zmin;        
...
        xmm = skip_point->x * dnorm + params->x_orig + west;
        ymm = skip_point->y * dnorm + params->y_orig + south;

skip_point seems read-only now? Have things changed?

HuidaeCho avatar Jun 24 '24 13:06 HuidaeCho

(2) pass-by-reference and save the values by this pointer

@cyliang368 Does this still apply? I cannot find anywhere in IL_check_at_points_2d() that updates skip_point.

            xx = points[m - 1].x - skip_point->x;           
            yy = points[m - 1].y - skip_point->y;            
...
        zz = skip_point->z + zmin;        
...
        xmm = skip_point->x * dnorm + params->x_orig + west;
        ymm = skip_point->y * dnorm + params->y_orig + south;

skip_point seems read-only now? Have things changed?

Never mind. You use it in IL_check_at_points_2d_cvdev(). ;-)

HuidaeCho avatar Jun 24 '24 13:06 HuidaeCho

I'm running a new run of CI with the last 7 days of linter upgrades made, then merge

echoix avatar Jun 24 '24 16:06 echoix