nest-simulator icon indicating copy to clipboard operation
nest-simulator copied to clipboard

Modernizing the code base and making it more compliant to common C++ style guides

Open JanVogelsang opened this issue 2 years ago • 1 comments

After working on the nest kernel for some time, I stumbled across multiple cases of either error-prone or nonmodern code that in some cases even led to hard-to-find bugs that could have been easily avoided if the code followed modern C++ style guides.
Running cang-tidy on the whole codebase with the most important checks gave me ~8000 possible improvements. I took the time and went through all suggestions and applied the most important ones, especially those for which I knew they would definitely not break anything. I manually applied about 2000 of them, fixing the following problems:

  • Clean up .gitignore, as there were duplicated entries
  • Remove an unused closing html tag in the SLI-documentation
  • Annotate constructors with a single argument with the explicit keyword, to make sure that there is no implicit conversion happening between types that should not be converted without explicitly doing so
  • Remove redundant void from method signature if the method accepts no arguments. Putting void into the argument list is an artifact from old C code, but is not written in modern C++ anymore
  • Remove explicit variable initialization in cases where the value is never used, as the variable is defined again before ever accessing its value
  • Replace all old C NULL pointers values by the modern C++ nullptr, which is advised by Bjarne Stroustrup himself
  • Instead of initializing null-pointers using type* ptr = 0;, explicitly set the pointer to a nullptr (type* ptr = nullptr;)
  • Add override annotation when a function is supposed to be overridden to make sure errors are prevented when the signature of the derived class' method signature deviates from the parent's signature. There has been a bug in the archiving node related to this issue for example, which would not have been in the code for many months if we had the override annotation on that function

It still would be a lot of work to clean up the whole codebase, but this PR is a good start in making NEST a sustainable software.

JanVogelsang avatar Sep 22 '22 14:09 JanVogelsang

The MacOS test seems to fail for some reason, can someone help me find the cause of the issue? This was the only error I was able to find in the logs, but I have no idea what that is supposed to tell me:

System call: unlink(2) /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T//ompi.Mac-1663857184846.501/pid.77232/1/vader_segment.Mac-1663857184846.501.242a0001.3
  Error:       No such file or directory (errno 2)

JanVogelsang avatar Sep 22 '22 15:09 JanVogelsang

@JanVogelsang I built under macOS now (with Clang 14 while Github runners have Clang 13, but should not make much of a difference), and did not run into any problems. I also noted two things in the FULL_MACOS build here:

  • All tests pass, so the error message seems inconsequential. Let's see if it appears in the future again.
  • In the FULL_MACOS build, there is a large number of compiler warnings, which all originate from a small number of lines and are not issued with master. I also get them when building on my machine. Here some links to the log:
    • https://github.com/nest/nest-simulator/actions/runs/3106295133/jobs/5033069063#step:8:560
    • https://github.com/nest/nest-simulator/actions/runs/3106295133/jobs/5033069063#step:8:571
    • https://github.com/nest/nest-simulator/actions/runs/3106295133/jobs/5033069063#step:8:581
    • https://github.com/nest/nest-simulator/actions/runs/3106295133/jobs/5033069063#step:8:591
    • https://github.com/nest/nest-simulator/actions/runs/3106295133/jobs/5033069063#step:8:1159

and there are probably a few more. They all seem to be related to overriding.

heplesser avatar Sep 29 '22 08:09 heplesser

@heplesser I addressed all issues related to overrides in my latest commit, which should fix the compiler warnings. The warning we got originated from the following problem:
'...' overrides a member function but is not marked 'override': Adding the override specifier is not mandatory and simply produces a compilation error whenever a function in a derived class that is supposed to override a function in the base class doesn't actually override anything. This helps us developers to avoid problems where the function signatures just vary slightly (e.g. using 'SpikeEvent' in one place and 'Event' in another, which would - and actually did - result in a bug).

JanVogelsang avatar Sep 29 '22 14:09 JanVogelsang

@JanVogelsang Thanks a lot, nice that all now passes without warnings on all systems. I saw that the "unlink" error message is still there on macOS, but that might just as well come from the py-mpi-testing setup, so I would not worry about it.

heplesser avatar Sep 30 '22 06:09 heplesser

I went through all my changes again, to make sure that my statement that nothing changes performance-wise is correct, and that is indeed the case.
For any potential reviewers: 90% of the changes are just missing override specifiers and using nullptr instead of NULL or 0 now. If somebody would review those, there are only about 100 lines left with other changes that might need some more time to review.

JanVogelsang avatar Sep 30 '22 08:09 JanVogelsang

There are still quite a few formatting errors. Most of them because of leftover whitespace in empty parentheses of functions, for example here: https://github.com/nest/nest-simulator/blob/96620fe9d6f80dcc0b94a17fbe6bd2d52ff06049/libnestutil/lockptr.h#L124

It's not picked up because of a bug (#2471) which will be fixed by #2478.

hakonsbm avatar Sep 30 '22 09:09 hakonsbm

Alright, will fix those asap.

JanVogelsang avatar Sep 30 '22 09:09 JanVogelsang

@jougs Could you be the other reviewer? 95+% is quickly reviewed boiler plate, in the remaining 5% are a few things that would benefit from experienced eyes.

heplesser avatar Sep 30 '22 21:09 heplesser

@JanVogelsang Just one more comment: In order to keep this PR manageable, I'd suggest to not touch any other code lines in this PR beyond those already touched (except adding #include <cmath> if needed, see inline comments). For future similar PRs, I'd suggest to strictly separate changes if at all possible, e.g., one PR only adding override, one only changing to nullptr etc to make reviewing easier.

heplesser avatar Oct 01 '22 06:10 heplesser

@heplesser Re-requesting a review seems to remove other reviewers from the list, could you add @jougs again?

JanVogelsang avatar Oct 04 '22 09:10 JanVogelsang

I don't get why the static checks fail. It gives me the following error message:

Files with erroneous copyright headers:
... sli/sli/lockptrdatum.h

I can't see how the copyright header of that file is different to the other's though.

JanVogelsang avatar Oct 04 '22 11:10 JanVogelsang

I don't get why the static checks fail. It gives me the following error message:

Files with erroneous copyright headers:
... sli/sli/lockptrdatum.h

I can't see how the copyright header of that file is different to the other's though.

It seems that you accidentally moved sli/dictstack.h to sli/sli/lockptrdatum.h, so now the file name in the copyright notice and filename do not agree.

heplesser avatar Oct 04 '22 11:10 heplesser

I don't get why the static checks fail. It gives me the following error message:

Files with erroneous copyright headers:
... sli/sli/lockptrdatum.h

I can't see how the copyright header of that file is different to the other's though.

It seems that you accidentally moved sli/dictstack.h to sli/sli/lockptrdatum.h, so now the file name in the copyright notice and filename do not agree.

See you found it too :)

heplesser avatar Oct 04 '22 11:10 heplesser

Got it! No idea how that happened... Anyway, with that out of the way, a new challenger appeared: sli/lockptrdatum.h. The pipeline complains that the formatting of this file is incorrect. However, if I run clang-format-13 (which I think is the same that the static_checks pipeline runs) on that file, the check still tells me the formatting is incorrect. If I revert any changes that have been made to this function (to a version that passed the checks before), the checks still don't pass.
I might overlook something here, but at this point I am genuinely confused.

JanVogelsang avatar Oct 04 '22 12:10 JanVogelsang

@JanVogelsang Please try compiling before you commit and push, so we don't get a lot of commits in master that doesn't compile (compiling would also have told you about the missing semicolon from earlier :wink:).

There are also still formatting errors that aren't picked up by the static checks. You can fix all C++ formatting errors by running the script build_support/format_all_c_c++_files.sh from the root directory. Just remember to use the right clang-format version. However, I suggest that we merge #2478 asap to avoid any formatting errors sneaking in.

hakonsbm avatar Oct 04 '22 13:10 hakonsbm

Yeah, will do that every time before pushing in the future. Usually do that, didn't do it in this case as I didn't imagine the changes could break anything.

Good to know, will run that!

JanVogelsang avatar Oct 04 '22 14:10 JanVogelsang

@hakonsbm How do I run build_support/format_all_c_c++_files.sh? It gives me the error clang-format not found, however running clang-format outside the script works just fine.

JanVogelsang avatar Oct 04 '22 14:10 JanVogelsang

@hakonsbm How do I run build_support/format_all_c_c++_files.sh? It gives me the error clang-format not found, however running clang-format outside the script works just fine.

Maybe better to not run that script directly, as it is being called from the ci_build.sh, which does some setups before running the whole check.

I recommend to check instead static_code_analysis, as it is the main script for checking the formatting. It starts here: cppchecks

med-ayssar avatar Oct 04 '22 15:10 med-ayssar

@hakonsbm By the way, is it possible to squash commits that are already pushed somewhere? Would be great if I could squash all commits of this PR into a single one once everything is done.

JanVogelsang avatar Oct 04 '22 17:10 JanVogelsang

I would think that squashing could make sense in this case, but I would leave the final word to @jougs and @terhorstd.

heplesser avatar Oct 05 '22 05:10 heplesser

@JanVogelsang @med-ayssar format_all_c_c++_files.sh is not being called from the CI, as that would change the files, but is meant as a tool for developers (see Check your code in the documentation). The script defaults to the command clang-format, but you can give a different command by setting the CLANG_FORMAT variable:

CLANG_FORMAT=clang-format-13 build_support/format_all_c_c++_files.sh

hakonsbm avatar Oct 05 '22 07:10 hakonsbm

@hakonsbm I am unsure why the script is not working out of the box, as it seems correct to me (and clang-format does exist and can be found from within bash), but defining CLANG_FORMAT=clang-format-13 manually does the job as well, so that will work.

JanVogelsang avatar Oct 05 '22 10:10 JanVogelsang

@JanVogelsang Thanks, I have marked most of the matters I had raised as resolved.

heplesser avatar Oct 06 '22 21:10 heplesser

Concerning proxynode::get_thread(): I just tried (not in Jan's branch) to make Node::get_thread() virtual and have proxynode::get_thread() override it. If I do that, quite a number of tests fail in the testsuite. A minimal reproducer is

SLI ] << /local_num_threads 4 >> SetKernelStatus
SLI ] /n /parrot_neuron 8 Create def
SLI ] /sr /spike_recorder 8 Create def
SLI ] n sr /one_to_one Connect

I have not systematically tried to go below four threads and eight neurons. The assert(false) triggers when get_thread() is called from https://github.com/nest/nest-simulator/blob/e1c80d01f8318b5b2ce5d2f292cafb1f0e26047d/nestkernel/connection_manager.cpp#L1448

Without the virtual, source_thread is assigned whatever Node::get_thread() here returns. This is not really problematic, because it so happens that the following code https://github.com/nest/nest-simulator/blob/e1c80d01f8318b5b2ce5d2f292cafb1f0e26047d/nestkernel/connection_manager.cpp#L1449-L1471

will in the end just return NO_CONNECTION if the node is a proxy.

But this does not seem clean to me. @jougs, what do you think?

NB: If there is any need to act on this, we need to open a new issue.

heplesser avatar Oct 06 '22 22:10 heplesser

Just an update to my previous post: I think we can solve the problem by changing the order of test in the code above in such a way that we first check for is_proxy() and if that is true, we never ask for the thread. I will try to produce a PR for that later.

heplesser avatar Oct 07 '22 06:10 heplesser

I am genuinely confused why, but after running clang-tidy again now (even though I am using clang-tidy v13 now instead of v15 to make it consistent with our clang-format version) I get 936 more occurrences of

  • Missing overrides
  • Using nullptr instead of NULL/0
  • Redundant void argument Should I fix those as well either in this PR after everything else is resolved or in a separate PR?

JanVogelsang avatar Oct 07 '22 08:10 JanVogelsang

Please create another PR, as to reduce confusion for reviewers.

jougs avatar Oct 07 '22 09:10 jougs