shap icon indicating copy to clipboard operation
shap copied to clipboard

Blacken source code?

Open jamesmyatt opened this issue 6 years ago • 5 comments

Black is emerging as the standard code formatter for Python (https://github.com/python/black, https://www.youtube.com/watch?v=esZLCuWs_2Y)

Are you open to reformatting the code using this? If so, I can submit a PR for it.

jamesmyatt avatar Jul 08 '19 11:07 jamesmyatt

@jamesmyatt That is great idea! Would you be interested in making PR? It should be added to pyproject.toml and maybe into pre-commit.

@thatlittleboy Could you confirm that there is no black formatted used at the moment?

detrin avatar Aug 26 '23 11:08 detrin

@detrin We discussed that before over at https://github.com/dsgibbons/shap/issues/21, our conclusion was to not use black for the time being.

I think we can repoen the discussion for it at some stage, but I would say it's not a huge priority.

thatlittleboy avatar Aug 26 '23 13:08 thatlittleboy

@thatlittleboy Okay so I read the other issue and all the points are valid.

My habit is that when I join new project that wasn't maintained properly using black and any fancy tools can make the whole developer experience a bit easier. I see it as motor oil.

One concern though is that reformatting with black might make it very difficult to merge any changes between forks, as the diff would be enormous and merge conflicts would likely arise everywhere. It would be great if the original repo were already formatted with black, but we have to be pragmatic.

Agree.

but I would say it's not a huge priority.

Also, agree. I think splitting the repository would also help to make it more maintainable.

detrin avatar Aug 26 '23 13:08 detrin

@connortann Is this already solved by #3550?

CloseChoice avatar Mar 09 '24 05:03 CloseChoice

No, although it's related: that PR only related to docstrings.

We have applied the black code style (implemented via ruff format) to the notebooks, but not yet to the code. We could indeed extend the autoformatting to the code base, although I've been a bit hesitant so far as the diff is huge and would be a bit of a pain to review. However, I think now is probably a good a time as any, so I'll have a look on Monday.

connortann avatar Mar 09 '24 23:03 connortann

I made a quick script to see what the effect of the formatter for a range of line lengths, counting the lines changed in the resulting git diff.

A line length of 120 seems like a reasonable choice that would keep the diff to a minimum. FYI @CloseChoice

image

Script to reproduce:

import subprocess
import sys
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt

REPO = Path("~/projects/shap").expanduser().resolve()
RUFF = "/home/connor/miniforge3/envs/shap/bin/ruff"


def run_repo_command(command) -> str:
    result = subprocess.run(
        command,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        shell=True,
        text=True,
        cwd=REPO,
    )
    if result.returncode != 0:
        print(f"Error running command '{command}': {result.stderr}")
        sys.exit(1)
    return result.stdout.strip()


def evaluate_line_length_effect(line_length) -> dict[str, int]:
    print("Evaluating line length effect for length:", line_length)
    # Nb. on my local "format" branch, the .py files are included by the ruff formatter.
    run_repo_command("git checkout format")
    run_repo_command("git reset --hard")
    run_repo_command(f"{RUFF} format shap tests --line-length {line_length}")
    diff = run_repo_command("git diff --shortstat")
    # E.g."181 files changed, 13157 insertions(+), 5131 deletions(-)"
    metrics = [int(s) for s in diff.split() if s.isdigit()]
    assert len(metrics) == 3, "Unexpected output from git diff --shortstat: " + diff
    return dict(
        length=line_length,
        changed=metrics[0],
        insertions=metrics[1],
        deletions=metrics[2],
    )


def scrape():
    data = [evaluate_line_length_effect(length) for length in range(50, 200)]
    df = pd.DataFrame(data)
    df.to_csv("line_length_effect.csv", index=False)


def plot():
    df = pd.read_csv("line_length_effect.csv")
    fig, ax = plt.subplots()
    df.plot(x="length", y="changed", ax=ax)
    ax2 = df.plot(x="length", y="insertions", ax=ax, secondary_y=True)
    ax2.set_ylim(bottom=0)
    plt.show()


if __name__ == "__main__":
    scrape()
    plot()

connortann avatar Jun 28 '24 14:06 connortann