pybaseball
pybaseball copied to clipboard
Developing a style guide
So in a recent PR I tried to bring in some formatting changes (not necessarily on purpose, mostly because it was my first PR, and I always have some sort of auto pep 8 formatter on, 😆).
This led to quite a few unrelated code changes and some understandable concern on @schorrm 's part (expecially for some of the choices that were made by the formatter.
However, I think (and I believe @schorrm agrees to some extent) that adding some code style standards could be fruitful, and if we can all coalesce around a shared tool and config to keep it painless, the better! The goal of the style guide would to make the code more readable and internally consistent.
So I'd like to use this issue to discuss what some participants like in a style guide, don't like in a style guide, or are apathetic to.
I'll begin with a few of mine.
- I prefer capping lines at a length of 120, but am not opposed (and frequently will) break apart lines smaller than that if I feel like it will improve readability. For example:
# Technically legal
cols = [col.replace('*', '').replace('#', '') for col in cols]
# More readable in my opinion
cols = [
col.replace('*', '').replace('#', '') for col in cols
]
# For extra long lines I'd even break it this way as well
cols = [
col.replace('*', '').replace('#', '').extraLongFunctionGoesHereToTakeUpRoom()
for col in cols
]
- Along those same lines I prefer to break apart long strings like this:
my_string = "Pretend this string goes on for something like 120 characters... " +
"The rest of the string goes here."
- Along those same lines I think for long lines that are function calls, treating the function parens like curly brackets in other languages makes for a clean look:
ata = fangraphs.get_fangraphs_tabular_data_from_url(
_FG_TEAM_PITCHING_URL.format(
start_season=start_season,
end_season=end_season,
league=league,
ind=ind,
)
)
- I prefer spacing around my type declarations, variable assignments, and operators:
def team_pitching(start_season: int = None):
for season in range(start_season, start_season + 1):
pass
- For longer function definitions I prefer splitting them apart per parameter:
def team_pitching(
start_season: int,
end_season: int = None,
league: str = 'all',
ind: int = 1,
):
-
I also really prefer when the code gets a pylint score of 10.0, but there are some linting failures I don't flip out about (like docstrings on modules).
-
When possible, I prefer type hinting in function params and returns so MyPy can help catch misuse before runtime.
-
I would like to eliminate all print statements if possible. Print statements are an uncontrolled side effect for anyone using the library downstream. Instead we should use the logging library and give the user some control over where the logs go:
https://docs.python.org/3/howto/logging.html#logging-basic-tutorial
To be clear, these are my personal preferences. Not trying to dictate. 😄
Also for reference, here is Google's style guide:
https://google.github.io/styleguide/pyguide.html
When there are things like long column specifiers, having ~125 char lines sometimes will be clearer.
# Technically legal cols = [col.replace('*', '').replace('#', '') for col in cols] # More readable in my opinion cols = [ col.replace('*', '').replace('#', '') for col in cols ]
I see zero readability benefit from this.
I prefer spacing around my type declarations, variable assignments, and operators:
def team_pitching(start_season: int = None): for season in range(start_season, start_season + 1): pass
Agreed to this one.
For longer function definitions I prefer splitting them apart per parameter:
def team_pitching( start_season: int, end_season: int = None, league: str = 'all', ind: int = 1, ):
I do not. I prefer
def team_pitching(start_season: int, end_season: int = None, league: str = 'all', ind: int = 1):
Re the other guidelines:
- I like general use of type annotations in function signatures.
- I think we should use f-strings where possible (though not necessarily backport everything to it). And other things like the crazy long URLs defining once and once and then
.format
-ing them is preferable. - I believe that we shouldn't have a policy about
" / '
. It's a waste of time. And if one of them saves you an escape in a given string, use it. - As we start using more and more type hinting, we need to start doing type aliasing.
Union[str, int, float, datetime]
is too much.ColType
or something. - We need a relatively long column max. No way 80, 120+.
# More readable in my opinion cols = [ col.replace('*', '').replace('#', '') for col in cols ]
I see zero readability benefit from this.
For longer function definitions I prefer splitting them apart per parameter:
def team_pitching( start_season: int, end_season: int = None, league: str = 'all', ind: int = 1, ):
I do not. I prefer
def team_pitching(start_season: int, end_season: int = None, league: str = 'all', ind: int = 1):
By way of explanation of my reasoning, here is why I prefer each of those. To me it's about cognitive load. If I'm reading through and understanding code, I find it's easier when longer pieces of code are pre-broken into manageable chunks. I think of it as extra lines are free, brain cells are expensive. So if I can trade one for the other I will. This is just my preference though. It seems you prefer cohesive chucks of code, and I'm sure there's an argument that joining those separated lines takes brain cells as well.
We can see what others think as well. My preference (especially as a newcomer to the project) isn't especially important. 😄
This line is 103 chars. I think it is a good example of being clean as-is
mlb_only_cols = ['key_retro', 'key_bbref', 'key_fangraphs', 'mlb_played_first', 'mlb_played_last']
If we do have one -- how would we use the file thingy? yapf sounds like the normal answer, yet seems perennially broken on VSCode
If we do have one -- how would we use the file thingy? yapf sounds like the normal answer, yet seems perennially broken on VSCode
We could add a .vsocde/settings.json that specifies using yapf. We could also add a config file for yapf to the repo that would allow us to set the preferred style:
YAPF will search for the formatting style in the following manner:
Specified on the command line
In the [style] section of a .style.yapf file in either the current directory or one of its parent directories.
In the [yapf] section of a setup.cfg file in either the current directory or one of its parent directories.
In the [style] section of a ~/.config/yapf/style file in your home directory.
More info here: https://github.com/google/yapf#formatting-style
This line is 103 chars. I think it is a good example of being clean as-is
mlb_only_cols = ['key_retro', 'key_bbref', 'key_fangraphs', 'mlb_played_first', 'mlb_played_last']
Yeah, I'd agree. I wouldn't automatically break this up (though to be 100% honest, I probably wouldn't complain if someone did either).
I mean my VSCode yapf is always broken
Strange. Does yapf from the command line work?
If so, then in your "Output" console on the bottom change the dropdown to Python, and see what it may be showing.
If we can get that working, I can create a yapf branch that we can tweak the style rules to get them where we want them.
Just an idea: black
--- you get zero customization, but it's stupid simple and its style is inoffensive to basically everyone. Integrates with all editors, and can be run on all PRs as a GitHub action.
Maybe blue if they do a CI - we've been single quotes until this point.
Just an idea:
black
--- you get zero customization, but it's stupid simple and its style is inoffensive to basically everyone. Integrates with all editors, and can be run on all PRs as a GitHub action.
The only thing about the "opinionated" formatters, is I hate most of their opinions. 😆