mdanalysis Code formatting, style guide enforcement and static checks

I recently started using code formatting (with black), style guide enforcement (with flake8) and static code checks (with mypy). I tried to apply them to MDAnalysis for fun and came across to many warnings.

Describe the solution you'd like

I think it would be great to format MDA with black (or other tools) and then use flake8 and mypy (or other tools) to spot and fix possible problems.

Describe alternatives you've considered

There are other tools that can be considered: pylint, isort, ypaf, ...

Additional context

Many errors come from MDAnalysis/migration/.

You can have a look at this commit to see the result of automatic code formatting with black.

Jan 16 '20 14:01 RMeli

We had (or have?) pylint on, but it sometimes throws annoying errors which stop a PR merging.

i think with something like black, we'd have to have every contributor follow the style, which would mean either a) instructions for everyone on how to apply it easily or b) some sort of CI system which fixes up the formatting on merging. Both seem like they'd add ongoing effort for not much real gain?

Jan 17 '20 10:01 richardjgowers

@richardjgowers I just wanted to spark the discussion but I agree the ratio benefits/effort might be small. However, #2409 and #2411 could be easily spotted by static checks. I also have found some indentation errors here and there that will make the code fail when executed.

Automatic code formatting (with black or other) is useful only if used with pylint or flake8, so that it removes most of the useless formatting errors. (It might reduce commit diffs if everyone uses it, but this is a very marginal benefit). However, it's really easy to install (pip install black) and apply (black .).

Jan 17 '20 10:01 RMeli

I am all in favour of linters and static analyzers. I am considering adding type hints when we would have dropped python 2. I am less keen on formating tools, though: the way I format my code is often deliberate and, sometimes, going against the convention helps with readability or other things.

Jan 17 '20 17:01 jbarnoud

I think adding type hints will be a great!

I understand that automatic code formatting is not something every developer is willing to accept, but it would be the easiest way to remove most of the linting problems in one go at the beginning. I haven't tried pylint, but flake8 gives very reasonable complaints (when used after automatic code formatting).

Most of the tools can be configured in setup.cfg in order to reach a general consensus.

Feb 17 '20 23:02 RMeli

I'd suggest only applying the CI linting to the diff if we do this, rather than driving the full code base through a large amount of churn. SciPy is working on adopting something like this in: https://github.com/scipy/scipy/pull/11423

I'd be cautious about adopting type hints---maybe if/when they become widely adopted at the base of our dependency tree in NumPy/SciPy, etc.

Feb 21 '20 20:02 tylerjereddy

I like the idea of linting only the diff! What is the drawback of type hints if they are not well adopted by the dependencies?

Feb 22 '20 15:02 RMeli

There may not be any major drawback, I just mean it could be sensible to wait to see what the rest of the ecosystem decides to do (if anything) about hinting in scientific code.

For example, NumPy has an experimental repo for type hinting with arrays: https://github.com/numpy/numpy-stubs

See also long-standing issue here: https://github.com/numpy/numpy/issues/7370

Perhaps it may have some influence on what we might want to do.

Feb 22 '20 15:02 tylerjereddy

Note that SciPy has started accepting PRs that add type hints to source code, so maybe the time for doing this is not really so far away. May want to wait until we officially drop Python 2 support, as noted above.

Apr 26 '20 21:04 tylerjereddy

+1 on being apprehensive about autoformatters. My main objection is what they do to commit history. It's useful to look at a line of code and see who committed it in which PR, so you can check out any discussion and know who to direct questions to.

CI linting the diffs seem like a great idea and would remove some of the PEP8 comments on reviews.

Jun 05 '20 20:06 lilyminium

Just tagging this additional discussion thread here.

Jun 17 '20 12:06 IAlibay

https://github.com/marketplace/pep-8-speaks may be of interest.

Edit: instead of adding a check that can fail, it simply comments so contributors and reviewers can decide if they want to follow suggestions or not.

Jul 03 '20 06:07 lilyminium

I installed pep8-speaks for all of MDA — if it’s too annoying we can deactivate it again or only enable it for specific repos.

One can also configure it with a config file as described in https://github.com/OrkoHunter/pep8speaks#configuration

Jul 04 '20 06:07 orbeckst

+1 on being apprehensive about autoformatters. My main objection is what they do to commit history. It's useful to look at a line of code and see who committed it in which PR, so you can check out any discussion and know who to direct questions to.

Thanks to @mikemhenry, I realised that it is possible to ignore specific commits when using git blame and it seems that this is now supported by GitHub as well.

I am less keen on formatting tools, though: the way I format my code is often deliberate and, sometimes, going against the convention helps with readability or other things.

For this situation, it is possible to ignore blocks of code or single lines with # fmt: on/off and # fmt: skip.

May 07 '22 11:05 RMeli

hi there, just adding my two cents – prompted by @RMeli in https://github.com/MDAnalysis/UserGuide/pull/270#issuecomment-1644471797 – in my experience code formatters and static analysis tools (black, isort, flake8, mypy) are tremendously valuable. They make the codebase more readable, more maintainable, and lower the bar for entry for new contributors.

These tools (in my experience) are the de-facto standard in the industry, and familiarity with them is valuable. Of course, such tools only solve a narrow problem, they don't solve some of the most important feature requests, tech debt or bugs. But they are typically a low hanging fruit with a positive net impact on the health of a codebase and dev velocity.

Jul 21 '23 13:07 jandom

Hi everyone, just adding another two cents here -- git has a feature that allows you to ignore certain commits in git blame: https://stackoverflow.com/questions/53502654/how-do-i-run-a-code-formatter-over-my-source-without-modifying-git-history

Sadly it requires a configuration beforehand (you must run git config blame.ignoreRevsFile .git-blame-ignore-revs to make it work), and also doesn't (yet) work on the github's web interface -- although it's on a public beta since 8.03.2023 (check here), so I assume it'll be available for everyone relatively soon.

I imagine that messing up git blame was one of the arguments against using automatic formatting -- so just saying there's a workaround here.

Aug 02 '23 10:08 marinegor

I have been on record as one of those (if not the one) that dislike auto-formatting. I had issues with fixes that were suboptimal readability-wise, portions of the code where the formatting was intentional, and reformatting the existing code base breaking git blame.

With time, I come to accept that the few places where the formatting is suboptimal are outweighed by the benefit of auto-formatting. We can have guards where the formatting is intentional (like this superb periodic table). If we can ignore some commits for the blame, then all my objections are addressed.

There is just the question of how to handle pending pull requests that risk being riddled with merge conflicts.

Aug 02 '23 12:08 jbarnoud

We can have guards where the formatting is intentional

I'm still not an amazing fan of black (I am less wary now that that formatting style documentation has improved & there is a "future" section) but I also don't care that much as long as we are careful not to worsen readability / destroy intentional formatting.

There is just the question of how to handle pending pull requests that risk being riddled with merge conflicts.

In my opinion, the best approach would be to cut up the process in chunks. That way we can also review the changes in a sane way rather than have a million+ line changes. We work on files that aren't currently in progress, and if we get blocked by a PR that isn't going anywhere then that gives us a good reason to close stale PRs (of which we have many).

Aug 02 '23 12:08 IAlibay

I would however add here (for the @MDAnalysis/coredevs) that I would urge a discussion on priorities here. We have a lot of maintenance work to do, and many PRs that need a push (including all the type hinting ones and the guesser stuff).

I'm not saying don't do it, I am however saying let's consider the order in which we do things.

Aug 02 '23 12:08 IAlibay

Is it possible that you add a project-wide autoformatter settings, and then for all new PRs you ask people to a) run the formatter on the files they're changing b) add the respective commit to the .git-blame-ignore-revs file? Perhaps this might help reducing the number of merge conflicts, as well as delay the necessity to run it for all files at once.

Aug 02 '23 12:08 marinegor

Is it possible that you add a project-wide autoformatter settings, and then for all new PRs you ask people to a) run the formatter on the files they're changing b) add the respective commit to the .git-blame-ignore-revs file? Perhaps this might help reducing the number of merge conflicts, as well as delay the necessity to run it for all files at once.

I don't think that'll work - you'll get conflicts with files where folks are touching the same file.

Aug 02 '23 12:08 IAlibay

I don't think that'll work - you'll get conflicts with files where folks are touching the same file.

But if the settings are project-wide, their autoformatting (should) match unless they touched the same parts of the code? I'm not sure though, never tried it.

Aug 02 '23 12:08 marinegor

EDIT: Sorry folks — I thought this question happened on Egor's PR, not on the dedicated static code formatters issue ...so this is exactly where discussion should happen... oops. So most of what I wrote doesn't apply really. I am just crossing out the embarrassing lines... except the bit that says if anyone wants to push for such a change they better be ready to be responsible for handling it.

I think we can cut the discussion pretty short: We can consider autoformatters ~~but it ain't gonna happen in this PR as it's too big of a change. I am all for a productive discussion for how to move forward but can we please have it in a sensible place — either an issue or a thread on the developer list?~~

~~@marinegor apologies for stomping in here. From your perspective it's a sensible discussion to ask here as you're focused on your work and want to get it done efficiently.~~ From a project-wide perspective it's a discussion that has come up a few times. It looks as opinions are shifting so it's ok to have this discussion from time to time — nothing is set in stone — but in the context of the project it's a big, big undertaking and we have to weigh cost (=time/opportunity) vs benefit. ~~Let's do this in a dedicated place that won't distract from what your PR is about.~~

So can I ask that if anyone cares enough about the topic of autoformatters, ~~they start a discussion on the dev list (or open/re-open(?) an issue) and point all interested parties there... and~~ then feel responsible for following up on the discussions? Thanks.

Aug 02 '23 13:08 orbeckst

mdanalysis mdanalysis copied to clipboard

Code formatting, style guide enforcement and static checks

mdanalysis
mdanalysis copied to clipboard