black icon indicating copy to clipboard operation
black copied to clipboard

Black's public API

Open paugier opened this issue 5 years ago • 21 comments

With previous versions, it was possible to just use black.format_str(string) or black.format_str(string, line_length=82).

Now one needs to do:

black.format_str(string, mode=black.FileMode(line_length=82))

It's not a big deal in a module, but format_str was also useful in notebooks to display examples of Python code programmatically generated (for example by astunparse.unparse(tree)).

Therefore, it could be nice to support also black.format_str(string) and black.format_str(string, line_length=82).

The line_length argument is useful when one wants to get code adapted for a presentation (made with Jupyter for example).

paugier avatar Mar 23 '19 10:03 paugier

In general, maybe we should take a moment to consider the APIs we're "setting in stone" for the first stable release in April.

zsol avatar Mar 23 '19 12:03 zsol

Dropping in here - I'd like this, but also for formatting an AST for producing nicely formatted generated code (no parse step).

Also having parse_ast as part of the public API would be really helpful since it has improvements/best practices over Python's built in.

andrewbaxter avatar Jun 14 '19 11:06 andrewbaxter

Any progress? It would be useful for me as well

sheerun avatar Sep 06 '19 20:09 sheerun

For what it's worth, I've stumbled on this issue a number of times trying to find a way to programmatically call black without the use of a shell (i.e., in a CI pipeline). I've found that it's easy enough to do this using result = black.main([".", "--check"], standalone_mode=False) where result is what would have been the black script's exit code.

As for formatting a single string programmatically like this, I don't know how that could be done yet short of creating the file and sending it to black via this call. :thinking:

hawkins avatar Dec 18 '19 20:12 hawkins

As somebody who uses black in a code-generation pipeline, being able to hand black a string (instead of a path to a file) is hugely convenient. gofmt has a similar stable API for formatting go code programatically.

rambleraptor avatar Feb 05 '20 20:02 rambleraptor

A standard, documented API would be wonderful. One of my use cases is a unit test to check that all code is formatted with black. I would much rather do this by calling the API than by using subprocess. (Especially since I just ran into an issue where the installed black was available in the library, but the executable was somehow not on the PATH. Yes that was my problem, but I could have avoided it by not bothering with the executable.)

r-owen avatar Aug 05 '20 21:08 r-owen

Darker is one of the tools which invoke format_str() directly for performance reasons. It would be nice to be able to rely on that to remain a stable part of a supported Python API.

akaihola avatar Jul 18 '21 07:07 akaihola

In akaihola/darker#164 (comment) we noticed that format_str() gives different results for some files than running black from the command line or black.main(..., standalone_mode=False). This should probably be fixed before making format_str() an officially public Python API.

Consider this test script compare_black.py:

import io
import sys
from contextlib import redirect_stdout
from subprocess import PIPE, check_output

from black import FileMode, diff, format_str, main

path = sys.argv[1]

print("============================= format_str =============================")
with open(path) as f:
    content = f.read()
format_str_result = format_str(content, mode=FileMode())
print(diff(content, format_str_result, path, path))

print("================================ main ================================")
with redirect_stdout(io.TextIOWrapper(io.BytesIO())) as main_stdout:
    main(["--quiet", "--diff", path], standalone_mode=False)
main_stdout.seek(0)
print(main_stdout.read())

print("============================= subprocess =============================")
subprocess_result = check_output(["black", "--quiet", "--diff", path], encoding="UTF-8")
print(subprocess_result)

If you create a valid Python source code file with just a few newlines:

echo "



" >newlines.py

and reformat it using the above script:

python compare_black.py newlines.py                   
============================= format_str =============================
--- /tmp/myfile.py
+++ /tmp/myfile.py
@@ -1,5 +0,0 @@
-
-
-
-
-

================================ main ================================

============================= subprocess =============================

You see that format_str() removes those blank lines while main() and black leave the file untouched.

I haven't tried to find other cases where these outputs don't match. I wonder if it only happens to all-whitespace files. Edit: Yes, only with all-whitespace files.

akaihola avatar Jul 19 '21 04:07 akaihola

format_str() gives different results

From black/__init__.py it's clear why this is the case:

    if not src_contents.strip():
        raise NothingChanged

    dst_contents = format_str(src_contents, mode=mode)

So actually format_file_contents() will return results identical to those from the black command line for all-whitespace files as well. Thus, it's a better candidate for a public API entry point for formatting a string wtih Python code in it. It's just a less convenient one since exception handling for NothingChanged is required. The compare_black.py script above for example needs to be changed like this:

try:
    format_str_result = format_file_contents(content, fast=False, mode=FileMode())
except NothingChanged:
    format_str_result = content

akaihola avatar Jul 19 '21 04:07 akaihola

Is there currently a way to directly pass an ast to black? Or do I need to use astor, astunparse etc. to first generate a string and then pass that to black? For several reasons I would like to avoid writing a file to the disk.

I am working on an internal tool which performs some code generation. It does not even have to be a stable interface as it will only run sparingly and supervised so adopting changes would not be a problem.

septatrix avatar Jul 21 '21 17:07 septatrix

Is there currently a way to directly pass an ast to black?

No, and IMO it's basically a foregone conclusion Black won't be supporting an AST as input even within a semi or fully private "API". The formatting logic is tightly coupled to the shape and design of the AST produced by blib2to3 - I just tested passing in an AST from 3.8's ast module and it immediately crashed. Even we allow blib2to3's AST instances to be passed in as input, that's not great since it's not unlikely we'd change it. Since blib2to3 is our custom fork of lib2to3, there's definitely a few non-standard modifications in there (and probably lacks features that other consumers would want).

For several reasons I would like to avoid writing a file to the disk.

You don't have to use a file, format_file_contents works just fine (although I'd guess we would suggest format_str + another function being added that exposes the safety checks in favour of that in a stable API) with a string.

Or do I need to use astor, astunparse etc. to first generate a string and then pass that to black?

Yep.

It does not even have to be a stable interface as it will only run sparingly and supervised so adopting changes would not be a problem.

Unfortunately even giving the option of doing something even when it's not declared as stable won't stop people, and we would rather not have more of a maintenance burden.

The only thing that could change this situation is if we adopt a third-party externally provided AST when we switch parser (see #2318). There would still be a discussion about whether this is too niche / too internal of a case to support but at least maintainability-wise / technically it would be possible.

ichard26 avatar Jul 21 '21 17:07 ichard26

Or do I need to use astor, astunparse etc. to first generate a string and then pass that to black?

Yep.

Okay, than I will probably use that.

It does not even have to be a stable interface as it will only run sparingly and supervised so adopting changes would not be a problem.

Unfortunately even giving the option of doing something even when it's not declared as stable won't stop people, and we would rather not have more of a maintenance burden.

The only thing that could change this situation is if we adopt a third-party externally provided AST when we switch parser (see #2318). There would still be a discussion about whether this is too niche / too internal of a case to support but at least maintainability-wise / technically it would be possible.

I just took a quick look at the issue as well as the linked resources. I guess it would be nice if the chosen solution was compatible with the stdlib ast module as this would open possibilities for very neat code generation using black. However I understand that this is probably at the very bottom of the wishlist and would only be a nice-to-have for people who are willing to use unstable internal APIs.

For now I will take the detour using astunparse but as performance is not of concern that should not be a problem.

septatrix avatar Jul 21 '21 18:07 septatrix

Hi all,

If Black were to commit to a public API, what's your hunch about which functions will be included in it?

I'd like to fix the inconsistent results on all-blank Python files between Black and Darker, and for that I need to decide whether to use format_file_contents() or format_str(). See akaihola/darker#166 for details.

akaihola avatar Sep 05 '21 10:09 akaihola

Stable is looming closer than ever, so should we use it as an opportunity to finally define this public API?

felix-hilden avatar Jan 10 '22 14:01 felix-hilden

No, we should avoid feature creep for the stable release. We can add a defined API in a later release.

JelleZijlstra avatar Jan 10 '22 14:01 JelleZijlstra

Given how close the stable release is and how packed the TODO list is already (mypyc, removing Python 2, power operator spacing, ESP, stability policy enforcement w/ diff-shades?, graduating black's Python 3.10 support) I'd much prefer deferring this to a future release so we can take our time to properly design an API as Jelle said.

ichard26 avatar Jan 10 '22 14:01 ichard26

#1544 suggested having a dedicated function for checking a code sample for modifications. Could be nice.

felix-hilden avatar Jan 29 '22 16:01 felix-hilden

We could try to take this forward. Perhaps we can first commit to the simplest API and then expand as needed. So, we could only expose a function for formatting a string of code that returns the formatted string or an exception if something went wrong.

Currently, the best candidate we have is probably:

def format_file_contents(src_contents: str, *, fast: bool, mode: Mode) -> FileContent:
    ...

Which seems reasonable. Through that function we would also have to commit to Mode, TargetVersion and NothingChanged. And some other exceptions as well? Some discussion points:

  • Perhaps a simpler name should be adopted, like format_str, although that's already in use
  • We could polish the interface a bit, to something like (content: str, *, fast: bool, mode: Mode) -> str. Is there a reason to have the FileContent alias?
  • We could try to clean up __init__.py to only have our public API, so that there's less confusion about what exactly is included. This might be too optimistic, if people already use things and we're worried about breaking non-public API use. I'm less sympathetic, but something to consider.

Some more things to include, either later or at the same time could be:

  • A function for executing Black with its whole configuration discovery and file manipulation
  • Jupyter cell formatting
  • ...?

Thoughts!

felix-hilden avatar May 04 '22 19:05 felix-hilden

Thanks @felix-hilden!

Currently, the best candidate we have is probably:

def format_file_contents(src_contents: str, *, fast: bool, mode: Mode) -> FileContent:

In Darker, we use black.format_str() but could just as well use black.format_file_contents() once those two are fixed to produce identical results on all-whitespace files (see #2484 and akaihola/darker#166).

A function for executing Black with its whole configuration discovery and file manipulation

Currently Darker parses Black configuration files by itself and passes some of the configuration options toblack.format_str(..., mode=Mode(...)). It could use a Black public API instead for discovering, loading and parsing Black configuration (but not executing the re-format).

The complete list of Black internals used by Darker is:

# `FileMode as Mode` required to satisfy mypy==0.782. Strange.
from black import FileMode as Mode
from black import (
    TargetVersion,
    assert_equivalent,
    parse_ast,
    stringify_ast,
    find_project_root,
    find_pyproject_toml,
    parse_pyproject_toml,
    format_str,
    re_compile_maybe_verbose,
)
from black.const import DEFAULT_EXCLUDES, DEFAULT_INCLUDES
from black.files import gen_python_files
from black.report import Report

akaihola avatar May 05 '22 11:05 akaihola

Hi team, is there any plan to make format_cell() a public Python API?

It helps a lot to format code strings within a notebook cell. I noticed that this function has been applied in jupyter_black. This has not been a public python API yet. I'd like to apply this API to simplify the codebase in a project, but I'm also a little bit worried about if it will be maintained in future releases.

If you have a plan to add it as a public API, I'm more than happy to help.

Hangzhi avatar Jun 23 '22 15:06 Hangzhi