pybaseball icon indicating copy to clipboard operation
pybaseball copied to clipboard

Developing a style guide

Open TheCleric opened this issue 4 years ago • 15 comments

So in a recent PR I tried to bring in some formatting changes (not necessarily on purpose, mostly because it was my first PR, and I always have some sort of auto pep 8 formatter on, 😆).

This led to quite a few unrelated code changes and some understandable concern on @schorrm 's part (expecially for some of the choices that were made by the formatter.

However, I think (and I believe @schorrm agrees to some extent) that adding some code style standards could be fruitful, and if we can all coalesce around a shared tool and config to keep it painless, the better! The goal of the style guide would to make the code more readable and internally consistent.

So I'd like to use this issue to discuss what some participants like in a style guide, don't like in a style guide, or are apathetic to.

I'll begin with a few of mine.

  • I prefer capping lines at a length of 120, but am not opposed (and frequently will) break apart lines smaller than that if I feel like it will improve readability. For example:
# Technically legal
cols = [col.replace('*', '').replace('#', '') for col in cols]

# More readable in my opinion
cols = [
    col.replace('*', '').replace('#', '') for col in cols
]

# For extra long lines I'd even break it this way as well
cols = [
    col.replace('*', '').replace('#', '').extraLongFunctionGoesHereToTakeUpRoom()
    for col in cols
]
  • Along those same lines I prefer to break apart long strings like this:
my_string = "Pretend this string goes on for something like 120 characters... " +
    "The rest of the string goes here."
  • Along those same lines I think for long lines that are function calls, treating the function parens like curly brackets in other languages makes for a clean look:
ata = fangraphs.get_fangraphs_tabular_data_from_url(
    _FG_TEAM_PITCHING_URL.format(
        start_season=start_season,
        end_season=end_season,
        league=league,
        ind=ind,
    )
)
  • I prefer spacing around my type declarations, variable assignments, and operators:
def team_pitching(start_season: int = None):
    for season in range(start_season, start_season + 1):
        pass
  • For longer function definitions I prefer splitting them apart per parameter:
def team_pitching(
    start_season: int,
    end_season: int = None,
    league: str = 'all',
    ind: int = 1,
):
  • I also really prefer when the code gets a pylint score of 10.0, but there are some linting failures I don't flip out about (like docstrings on modules).

  • When possible, I prefer type hinting in function params and returns so MyPy can help catch misuse before runtime.

  • I would like to eliminate all print statements if possible. Print statements are an uncontrolled side effect for anyone using the library downstream. Instead we should use the logging library and give the user some control over where the logs go:

https://docs.python.org/3/howto/logging.html#logging-basic-tutorial

TheCleric avatar Sep 03 '20 19:09 TheCleric

To be clear, these are my personal preferences. Not trying to dictate. 😄

TheCleric avatar Sep 03 '20 19:09 TheCleric

Also for reference, here is Google's style guide:

https://google.github.io/styleguide/pyguide.html

TheCleric avatar Sep 04 '20 00:09 TheCleric

When there are things like long column specifiers, having ~125 char lines sometimes will be clearer.

schorrm avatar Sep 05 '20 21:09 schorrm

# Technically legal
cols = [col.replace('*', '').replace('#', '') for col in cols]

# More readable in my opinion
cols = [
    col.replace('*', '').replace('#', '') for col in cols
]

I see zero readability benefit from this.

I prefer spacing around my type declarations, variable assignments, and operators:

def team_pitching(start_season: int = None):
    for season in range(start_season, start_season + 1):
        pass

Agreed to this one.

For longer function definitions I prefer splitting them apart per parameter:

def team_pitching(
    start_season: int,
    end_season: int = None,
    league: str = 'all',
    ind: int = 1,
):

I do not. I prefer

def team_pitching(start_season: int, end_season: int = None, league: str = 'all', ind: int = 1):

schorrm avatar Sep 06 '20 11:09 schorrm

Re the other guidelines:

  • I like general use of type annotations in function signatures.
  • I think we should use f-strings where possible (though not necessarily backport everything to it). And other things like the crazy long URLs defining once and once and then .format-ing them is preferable.
  • I believe that we shouldn't have a policy about " / '. It's a waste of time. And if one of them saves you an escape in a given string, use it.
  • As we start using more and more type hinting, we need to start doing type aliasing. Union[str, int, float, datetime] is too much. ColType or something.
  • We need a relatively long column max. No way 80, 120+.

schorrm avatar Sep 06 '20 11:09 schorrm

# More readable in my opinion
cols = [
    col.replace('*', '').replace('#', '') for col in cols
]

I see zero readability benefit from this.

For longer function definitions I prefer splitting them apart per parameter:

def team_pitching(
    start_season: int,
    end_season: int = None,
    league: str = 'all',
    ind: int = 1,
):

I do not. I prefer

def team_pitching(start_season: int, end_season: int = None, league: str = 'all', ind: int = 1):

By way of explanation of my reasoning, here is why I prefer each of those. To me it's about cognitive load. If I'm reading through and understanding code, I find it's easier when longer pieces of code are pre-broken into manageable chunks. I think of it as extra lines are free, brain cells are expensive. So if I can trade one for the other I will. This is just my preference though. It seems you prefer cohesive chucks of code, and I'm sure there's an argument that joining those separated lines takes brain cells as well.

We can see what others think as well. My preference (especially as a newcomer to the project) isn't especially important. 😄

TheCleric avatar Sep 06 '20 16:09 TheCleric

This line is 103 chars. I think it is a good example of being clean as-is

    mlb_only_cols = ['key_retro', 'key_bbref', 'key_fangraphs', 'mlb_played_first', 'mlb_played_last']

schorrm avatar Sep 11 '20 12:09 schorrm

If we do have one -- how would we use the file thingy? yapf sounds like the normal answer, yet seems perennially broken on VSCode

schorrm avatar Sep 11 '20 12:09 schorrm

If we do have one -- how would we use the file thingy? yapf sounds like the normal answer, yet seems perennially broken on VSCode

We could add a .vsocde/settings.json that specifies using yapf. We could also add a config file for yapf to the repo that would allow us to set the preferred style:

YAPF will search for the formatting style in the following manner:

Specified on the command line
In the [style] section of a .style.yapf file in either the current directory or one of its parent directories.
In the [yapf] section of a setup.cfg file in either the current directory or one of its parent directories.
In the [style] section of a ~/.config/yapf/style file in your home directory.

More info here: https://github.com/google/yapf#formatting-style

TheCleric avatar Sep 11 '20 13:09 TheCleric

This line is 103 chars. I think it is a good example of being clean as-is

    mlb_only_cols = ['key_retro', 'key_bbref', 'key_fangraphs', 'mlb_played_first', 'mlb_played_last']

Yeah, I'd agree. I wouldn't automatically break this up (though to be 100% honest, I probably wouldn't complain if someone did either).

TheCleric avatar Sep 11 '20 13:09 TheCleric

I mean my VSCode yapf is always broken

schorrm avatar Sep 11 '20 15:09 schorrm

Strange. Does yapf from the command line work?

If so, then in your "Output" console on the bottom change the dropdown to Python, and see what it may be showing.

If we can get that working, I can create a yapf branch that we can tweak the style rules to get them where we want them.

TheCleric avatar Sep 11 '20 19:09 TheCleric

Just an idea: black --- you get zero customization, but it's stupid simple and its style is inoffensive to basically everyone. Integrates with all editors, and can be run on all PRs as a GitHub action.

reddigari avatar Apr 26 '21 16:04 reddigari

Maybe blue if they do a CI - we've been single quotes until this point.

schorrm avatar Apr 26 '21 17:04 schorrm

Just an idea: black --- you get zero customization, but it's stupid simple and its style is inoffensive to basically everyone. Integrates with all editors, and can be run on all PRs as a GitHub action.

The only thing about the "opinionated" formatters, is I hate most of their opinions. 😆

TheCleric avatar Apr 26 '21 17:04 TheCleric