pencil icon indicating copy to clipboard operation
pencil copied to clipboard

Unify code style

Open scribblemaniac opened this issue 4 years ago • 12 comments

We talked about the code style a while back in #683 and mostly agreed upon all of the points, but never did anything with that. Now I have compiled all of the stylistic rules for this project and have used them to create an astyle options file to automate some of the rules, and a Qt Creator style file to make that editor behave a little bit better.

There are still many rules not captured in the astyle options, so we'll have to pay attention to those manually for now. It's possible that we can make some regex statements to check for some of the other rules like the documentation format, but that's not a high priority. I will also make a write up of the recommended style guidelines with examples, but I have not included it here because I'm not sure where I'm going to put it yet.

I will probably merge this after a few days with or without review because I do want this in before Hacktoberfest begins and it theoretically shouldn't affect functionality of Pencil2D in any way.

Closes #683.

scribblemaniac avatar Sep 28 '19 06:09 scribblemaniac

Changed my mind on this PR. I'll let it wait until after Hacktoberfest because a) applying the astyle changes will probably make it difficult to merge PRs based off of code before the style changes and b) I have not finished the human-readable documentation for the style guidelines.

scribblemaniac avatar Oct 03 '19 06:10 scribblemaniac

Style changes usually lead to massive merge conflicts, better to do it after merging the big PRs.

chchwy avatar Oct 03 '19 07:10 chchwy

I think now that v0.6.5 has been released, we should merge this and apply the astyle styling to all the code. Existing PRs and branches can merge up to and including commit 85432d9b14e23a3f03382a46a41a043448afc1e5, then apply the astyling to everything, commit it, then merge the remainder of master. It's not realistic to wait for all PRs to be merged, as I'm beginning to think that may never happen.


A Practical PR Merging Example

Master commit history*

85432d9 Exclude miniz library from astyle
1111111 Apply astyle to all code
2222222 Some other commits to master
3333333 Some other commits to master

PR commit history before merging

aaaaaaa The main commit of the PR, before 85432d9

Now to easily merge this PR with master, check it out and open up terminal in the main directory. Run the following commands:

git fetch --all
git merge 85432d9
astyle -rn --project "./*.cpp" "./*.h"
git commit -am "Apply astyle for PR #"
git pull origin master

In theory, this should produce a commit history on the PR like this:

aaaaaaa The main commit of the PR, before 85432d9
85432d9 Exclude miniz library from astyle
1111111 Apply astyle to all code
2222222 Some other commits to master
3333333 Some other commits to master
bbbbbbb Merge 85432d9 into pr-branch-name
ccccccc Apply astyle for PR #
ddddddd Merge pencil2d/pencil/master into pr-branch-name

A little messy? Yes. But it gets the job done and does it without the large number of conflicts that would likely result if you just did git pull right away.


I have plans to make a pre-commit hook for this so you can set it and forget it. Until then however, you will all have to run the astyle command astyle -rn --project "./*.cpp" "./*.h" before making your commits (at least the last one in a PR) or the build tests will fail if you have not followed the style guidelines. I am also planning on making formal guidelines to be placed on the wiki and in the generated documentation, and I will do so quickly if we are merging this.

@pencil2d/developers Thoughts? I am willing to help bring existing PRs in line with master if necessary. However, if these styling guidelines are going to be meaningful, I need you guys to be behind it.

scribblemaniac avatar Aug 18 '20 17:08 scribblemaniac

I feel like it's not the time yet. we still have a big coming PR undo/redo rewriting. BTW, I am yet convinced that putting * and & next to the identifiers is a good idea. I know where it comes from in the C language, but it makes less and less sense in the post C++11 era.

chchwy avatar Aug 19 '20 01:08 chchwy

I feel like it's not the time yet. we still have a big coming PR undo/redo rewriting.

We have to stop waiting for other PRs at some point. By the time that's merged, the mypaint branch or some other big thing will be waiting. As I've pointed out, this will not significantly increase the difficulty of merging as long as the procedure I mentioned is followed. We've just finished a release, I can't think of a better time to do it than now.

BTW, I am yet convinced that putting * and & next to the identifiers is a good idea. I know where it comes from in the C language, but it makes less and less sense in the post C++11 era.

I think it has obvious advantages for multiple declarations:

Object* obj1, obj2;
obj1 = new Object();
obj2 = new Object(); // NOPE!!!

To me, this is a very compelling reason to put the */& next to the identifier as it much more clearly shows that it applies to the only the first identifier, unlike the type which applies to both of them. I get how it has come to be more associated with the type in C++, but putting the character next to the identifier does not produce in any potential confusion/errors to the degree that it can next to the type. At the end of the day I'll go with whatever the majority of the team agrees upon. Once we start using astyle, it will be trivial to change this at any point.

scribblemaniac avatar Aug 19 '20 18:08 scribblemaniac

Personally I want to take a look at clang-format before this is merged. I feel like it has gained a lot of traction since we first discussed this three years ago, and I’ve heard some good things about it. It’s also supported out-of-the-box in Qt Creator (as well as CLion and IIRC also Visual Studio), so we could likely drop the Qt Creator specific configuration. Regardless of whether we actually end up using it, I at least want to give it a try. I’ll get that done ASAP to avoid blocking this PR over it.

That aside, I’ll second what @scribblemaniac just said about * and &.

J5lx avatar Aug 20 '20 17:08 J5lx

ok, I will give my reasons for that asterisk things. This is based on my experience in large scale C++ projects for many years. But I will follow the team anyway if you guys don't agree with my opinion.

  1. It's always good to give an initial value when declaring a variable.

For the code example

Object* obj1, obj2;

At least it needs to be:

Object* obj1 = nullptr, obj2 = nullptr;

Then the compiler will give you a type mismatch error. So personally I would do this:

Object* obj1 = nullptr;
Object* obj2 = nullptr;

The code is properly initialized (and clearer, uh, from my point of view).

It won't compile in the first place if I mess up the type. There is no way to make the mistake of NULL & 0 in the early days, especially after C++11 introduced nullptr.

Giving initial values is important. It has cost me significant time in debugging throughout my career, sometimes up to weeks and it's frustrating when I founded it's just an uninitialized variable. That's why I strongly prefer to have initial values on declarations. I don't want to spend more brainpower to think "ok, maybe it will be initialized in next a couple of lines".

Every modern language gives variables default values except C++, so it's we developer's duty to make sure it happens ;)

For references, you just can't declare multiple references without initializing them in one line. So it's actually only for pointers.

Object &obj1, &obj2; // no way
  1. Multiple declarations

Multiple declaration is very common in C89 because C89 compilers force you to declare variables at the start of a block. There is always a bunch of declarations at the start of a function in C89 libraries. It's not about style, it's because you have to.

In C++ we don't have this limit. It's better to define a variable right before you need it. Minimizing the scope of a variable helps organize the code as well as helps readability because it minimizes the number of variable names I need to memorize.

For the code example, I would personally prefer to declare Object* and new the object at once if I can.

Object* obj1 = new Object;
Object* obj2 = new Object;

So multiple pointers declarations in one line, at least for me, doesn't have that many benefits.

  1. Differentiate declaration and operator

I can easily distinguish pointer declaration and pointer deference by the position of * and & without thinking.

Object*: * next to type, it's pointer declaration. *obj: * next to identifiers, it's pointer dereferencing. no context needed.

As well as the case of declaring a reference and getting the address of a variable.

  1. It's part of Type

I don't know how you see these:

st::map<Key *, Value *> *myFunc(Param1 *p2, Param2 *p2)
{
    auto obj1 = dynamic_cast<Object *>(base);
    ...
}

Type is everywhere in C++. In many cases, the space between * and identifiers is not necessary.

For the code above, I have to recognize the return type st::map<Key *, Value *>* by telling my eyes skipping the space between std::map and *myFunc. Even space is naturally a syntax element separator.

Well, that's all I want to say. I know it's more like a preference rather than a best practice with solid underlying theory. If you guys still think asterisk bound to identifiers is better, I will follow. Sigh, what a bad language syntax of C++.

@J5lx clang-format would be awesome if it works as a per project setting. All my editors have been configured to put asterisks next to the type to follow my company's coding style. I will definitely need it if we want to go another way.

chchwy avatar Aug 21 '20 02:08 chchwy

First of all, thanks a lot for your detailed explanation, Matt. I still want to think about it a bit more, so I’ll be responding to that later.

For now, here’s what I noticed when trying out clang-format (compared with the AStyle configuration from this PR), in no particular order:

  • clang-format can’t enforce braces on control statements. Apparently it was considered out-of-scope at least at some point, but so was include ordering which has since been added. Either way, currently clang-tidy is needed for this.
  • Unlike astyle, clang-format does not have built-in support for recursive formatting, so it needs to be combined with something like find.
  • Neither astyle nor clang-format can differentiate between references/pointers that have an identifier after them and ones that don't.
  • clang-format can sort includes, which I think is pretty neat. It can even (re-)group them according to custom rules, but I haven’t tried that yet. Don’t know whether astyle can do this or if it’s just not enabled in the configuration. Can turn this off in clang-format if necessary.
  • clang-format currently can’t do Allman braces for lambdas. However, this issue seems to have been fixed (or at least added as an option) in the next version which is set to be released in less than a week.
  • As was already written above, astyle’s formatting of inline lambdas is sometimes pretty awful, the result from clang-format is much better.
  • I like clang-format’s handling of multi-line stuff better.
  • clang-format seems to be slightly better at keeping the line length in check.
  • However it also likes to put stuff on a single line when possible, which is sometimes slightly better and sometimes slightly worse (IMO).
  • Overall clang-format seems to be a little more comprehensive than astyle. That is, there are some cases which clang-format handles but astyle doesn’t, the other way around it’s mostly just the lambda brace thing mentioned above.
  • On the other hand, clang-format also kinda lends itself more towards “opinionated” formatting. That is, you have to use all of the options with one setting or another and it also touches some code where it’s not strictly necessary. Because of this, clang-format is actually stricter than what we originally agreed upon. The overall result is still sensible though (IMO).
  • clang-format has a penalty system for changing what to avoid, but I haven’t played around with it so far.

So far, I have only glanced over the formatted code, so I might have missed some stuff. But that aside I actually slightly prefer the results from clang-format so far. That said, I made the assumption that the astyle configuration from this PR is already as good as possible, so there’s a chance I’m a bit biased if some of the astyle stuff above can actually be fixed/changed in configuration.

Here are some comparisons:

Here is the configuration file I used. As I mentioned, clang-format wants configuration for everything, so in those cases that we didn’t originally discuss I tried to choose what I think is most common in our code (from the top of my head).

Sorry for this wall of text. tl;dr: Astyle and clang-format perform mostly similar (with astyle’s broken inline lambda formatting being perhaps the most notable exception, and for clang-format the need to resort to clang-tidy for enforcing braces), though personally I like clang-format’s formatting a little better. And then there’s clang-format’s IDE support, which was one of my main motivations for trying it out. What do you think?

J5lx avatar Aug 21 '20 14:08 J5lx

@J5lx I see your wall of text and raise you one :sunglasses: . The way I see it, clang-format is roughly on-par with astyle. I think one of the major issues with it is with manual execution. If we had the pre-commit hooks set up, this wouldn't be a huge problem, but as it stands right now, it seems like it would be much more of a pain to run than astyle. Astyle is one command with a few parameters as listed above, but for clang-format you would need to use clang-format and clang-tidy, and run them both with find or a similar utility. This isn't that much trouble for me, but it could be more trouble for new contributors and for making a script which works well across all platforms (esp. Windows). This is even worse when you want to ignore things like the miniz library and potentially more in the future.

To talk about a few points of difference:

  • I have had a look at the issue with the lamba indentation issue I pointed out earlier, and I think what this actually mostly comes down to is the indent-after-parens option which is currently not set so the default behavior is to line it up with the parentheses. The braces might still appear unindented, I have not tested this, but this can't really be compared to clang-format, where it is actually using the incorrect brace style for lambda expressions, as you mentioned.
  • I should note that astyle has just about double the number of options as clang-format. Size isn't everything though :stuck_out_tongue_closed_eyes:

clang-format can sort includes, which I think is pretty neat. It can even (re-)group them according to custom rules, but I haven’t tried that yet. Don’t know whether astyle can do this or if it’s just not enabled in the configuration.

Astyle cannot do this. I'm pretty sure I considered writing some regex voodoo to do this before because I really like the idea of ordered headers. Though we never talked about it, I am definitely am a supporter of this idea.

However it also likes to put stuff on a single line when possible, which is sometimes slightly better and sometimes slightly worse

I completely agree.

clang-format seems to be slightly better at keeping the line length in check.

The configuration could be partly to blame for that, I did not set max-code-length so astyle will let it get as long as it wants. If that is set, it tries to break it in logical places, probably very similar to clang-format. clang-format has a default column limit of 100 which is why it is automatically doing that.

That said, I made the assumption that the astyle configuration from this PR is already as good as possible, so there’s a chance I’m a bit biased if some of the astyle stuff above can actually be fixed/changed in configuration.

At the time I did this (about a year ago now), this was very close to the ideal astyle configuration. I combed through the docs to get as close to the agreed upon style guidelines as possible. Any new features or improvements that have been made since then are not included, but it doesn't seem that there has been a major release since then.

And then there’s clang-format’s IDE support, which was one of my main motivations for trying it out.

Actually astyle is supported by the Qt Creator just about as well as clang-format/clang-tidy. Both are supported with the Beautifier plugin: https://doc.qt.io/qtcreator/creator-beautifier.html, though it does seem that clang-format has the extra option in this plugin to format only a specific section of text.

scribblemaniac avatar Sep 14 '20 02:09 scribblemaniac

Now for a response to you @chchwy.

  1. It's always good to give an initial value when declaring a variable.

I agree with this, and I'm not saying that anyone shouldn't. Sometimes I don't do this when I know what the default initialization is and that's what I want (esp. for POD types), but perhaps that's a little lazy of me. Even if we say to always do it that way though, variables that are not immediately initialized will happen, especially if we can't enforce that in an automated way (which we can't with astyle or clangformat as far as I know).

And even if the values are always initialized it will not guarantee that the issue is caught. To add to the pile of contrived examples:

QString* string1 = nullptr, string2 = nullptr;

Will compile just fine. Presumably because of it's const char * constructor and copy operation, although I'll admit I don't feel confident enough in my understanding to say exactly what's going on, only that it clearly can go on.

  1. Multiple declarations ... In C++ we don't have this limit. It's better to define a variable right before you need it. Minimizing the scope of a variable helps organize the code as well as helps readability because it minimizes the number of variable names I need to memorize.

I also agree with this. Personally I prefer to use multiple declarations when two variables are closely related. For instance, if I was dealing with a 2d array, I might initialize a variable like int x, y;. This very rarely applies to pointers, and when it does I would probably use multiple lines anyway just to avoid this issue as completely as possible. Multiple declaration is an option, and I think it still has value even when trying to minimize scope and improve readability.

  1. Differentiate declaration and operator

I will concede this point, it is easier from a readability standpoint to distinguish between type and derefrencing. The way I look at it is probably something along the lines of:

  1. Is it a declaration? If yes, it's related to the type.
  2. Does it precede an identifier? If yes, it's related to dereferencing.
  3. Otherwise it is related to the type. Now this isn't perfect, but it's simple enough and works 99.9% of the times you will see the * character used (ignoring multiplication).
  1. It's part of Type

My understanding is this is something that evolved along with C++. I definitely think of it as part of the type in casual usage, and especially when it is inside <>. However, that only makes it more confusing for the multiple declaration issue where if it was 100% part of the type, all variables you define after using it would be pointers, and obviously that's not the case.

Part of me wants to think of it as a type modifier. After all, you can't declare a type with just * (void functionally achieves that, but then void could be viewed as the type). All other type modifieres, const, unsigned, etc. go in front of the type with a space:

* Object obj;

Well clearly this is not valid C++. The style which most closely resembles this and is valid would be Object * obj;. But nobody, myself included, really seems to like that style so this path of logic isn't particularly helpful to our current predicament. But maybe it will make you look at <Object *> a little less harshly, because I bet you never looked at <const Object> and thought, "that ought to be <constObject> :wink:

I agree with a lot of the things you bring up, but I think some of them are not strong justifications for using your style, but are rather reasons why we don't need to use the other style, which should not be confused with reasons against the other style. Taken collectively, I still think situations where your style could cause an issue are still a possibility and I think that outweighs the improvements in readability. I will also ultimately follow the whatever the majority agrees upon as well. Finally I want to stress that this is not written in stone, it can be easily changed later when we have this style automation in place. It really shouldn't hold up progress of this PR because I think we can all agree that a consistent style, whether it's the one we want or not, is better than half the code being one way and half the other way!

scribblemaniac avatar Sep 14 '20 04:09 scribblemaniac

I would like to also bring up vera++. Apparently it was one of the things I reviewed before settling on astyle. I probably did not use it because its set of default rules are pretty limited compared to the other tools mentioned here. However, you can create custom rules for it in Tcl (and now Python and Lua supposedly) which could be really powerful and potentially useful as a secondary step after the main stylization tool is run to implement any more specialized rules we desire (include groupings anyone?). It does not provide a semantic view of the files, aside from identified tokens, so modifying the lambdas may still be beyond its capability (although it may be possible to identify them with some ugly regex). Something to look into going forward at least.

scribblemaniac avatar Oct 10 '20 04:10 scribblemaniac

Alright, let’s try to get this moving again as well. First of all, I looked at the */& discussion again and the more I think about it, the more I feel like either way is fine, really. I don’t really want to go into much detail about it again, personally. However, I do want to second this about Matt’s arguments:

I agree with a lot of the things you bring up, but I think some of them are not strong justifications for using your style, but are rather reasons why we don't need to use the other style, which should not be confused with reasons against the other style.

I feel like this sums up my own feelings pretty well. Oh, and also this about the situation in general:

Sigh, what a bad language syntax of C++.

Anyway – with that out of the way, let’s talk about the formatter software again. And on that front, I looked into one aspect that we didn’t talk about so far, maintenance. And there’s a pretty significant difference between the three options, to say the least:

  • vera++: Most recent release almost seven years ago, most recent commit more than a year ago, issue tracker also pretty much dead since then (no maintainer interaction)

  • astyle: Most recent release almost four years ago, most recent commit more than two years ago, the project has been officially discontinued. The lead dev says:

    If you want to try another formatter, clang-format added features in version 11 that bring it close to Artistic Style. It also has paid programmers and is more likely to be around 10 years from now.

  • clang-format: three major LLVM releases within the last year, most recently just two days ago, adding bugfixes, enhancements and new formatting options

IMO, this is not even a competition anymore, it’s pretty hard to justify settling on an unmaintained piece of software. clang-format may not be perfect, but at least there’s a realistic chance that issues will be addressed. In fact, the lambda brace style issue mentioned previously was already addressed about a year ago. And as far as recursive invocation and use of clang-tidy is concerned, I think it should be easy enough to add some bash/batch scripts using find or forfiles to automate that. There’s also git-clang-format which is a neat little helper that runs clang-format on code that has been changed. And it might even be worth looking into clang-tidy more generally as a linter / static analyser to catch potential issues early (that is something that should be discussed separately, though).

Actually astyle is supported by the Qt Creator just about as well as clang-format/clang-tidy. Both are supported with the Beautifier plugin

I was actually talking about the newer, dedicated ClangFormat plugin which goes further than Beautifier and can also format your code as you type :)

custom rules for it in Tcl (and now Python and Lua supposedly) […] to implement any more specialized rules we desire (include groupings anyone?).

psst, clang-format has that built-in ;)

Anyway, I think it’s pretty clear at this point which my preferred option is ^^ But do let me know what you think!

J5lx avatar Oct 03 '21 19:10 J5lx