black icon indicating copy to clipboard operation
black copied to clipboard

Switch to a new parser

Open JelleZijlstra opened this issue 3 years ago • 4 comments

Currently, Black uses a vendored version of lib2to3 for parsing. This works well for parsing Python 2 and early Python 3, but Python has now moved on to a PEG-based parser (PEP 617), and lib2to3 is no longer being maintained.

So we need a new parser. There are a few existing options that we could leverage (Parso, LibCST), but it's going to be a lot of work to do the migration. WE're doing some early brainstorming in a Google doc. This issue exists so that we have a public record that we know this is a problem.

Concrete pieces of syntax that are blocked by this new grammar include parenthesized context managers and the match statement in Python 3.10.

JelleZijlstra avatar Jun 08 '21 20:06 JelleZijlstra

The main bug is: https://bugs.python.org/issue40360 (and also https://bugs.python.org/issue36541).

I think that there's a fairly straightforward way of wrapping the new Python parser to give the necessary functionality that "Black" (and other source-level tools) need. However, it's a non-trivial amount of work, and I'm loathe to do it unless I'm sure it'll be used and that nobody else is doing the work. (There appears to be one existing wrapper, namely leoAst.py; I've looked at it a bit but it seems much more complicated than necessary and therefore could be both difficult to use and a maintenance issue.)

Some other discussion at https://github.com/kamahen/pykythe/issues/27 https://github.com/google/yapf/issues/825#issuecomment-868805396 , https://github.com/google/yapf/issues/894#issuecomment-799867767 and elsewhere.

kamahen avatar Jul 05 '21 15:07 kamahen

Has treesitter been considered? It already implements a parser for python here: https://github.com/tree-sitter/tree-sitter-python and I think it allows to build formaters upon it.

ianliu avatar Nov 05 '21 14:11 ianliu

@ianliu interesting, I hadn't heard of that!

Looking at the Python bindings (https://github.com/tree-sitter/py-tree-sitter), it might be hard to get it to work for us:

  • Installing it requires a C compiler on most platforms (there's wheels only for MacOS/3.8)
  • And that doesn't even give you a Python grammar: you have to clone a repo and build the grammar at runtime.

That sounds like it would lead to a lot of people with mildly exotic systems who'd be unable to install Black if it depended on this library.

JelleZijlstra avatar Nov 05 '21 15:11 JelleZijlstra

LibCST now supports (according to readme) 3.0->3.11, though it does say

It is more difficult to implement tools that focus almost exclusively on whitespace on top of LibCST instead of lib2to3. For example, Black would need to modify whitespace nodes instead of prefix strings, making its implementation much more complex.

jakkdl avatar Nov 07 '22 15:11 jakkdl

Hi @JelleZijlstra, wanted to know if a resolution would be provided for this any time soon. Any alternatives/work arounds for now?

Udayraj123 avatar Mar 29 '23 18:03 Udayraj123

There are no concrete plans to switch to a new parser, but we have full support for the latest Python grammar changes through some hacks on our existing parser. What do you need a workaround for?

JelleZijlstra avatar Mar 29 '23 18:03 JelleZijlstra

Oh I see, I was facing this issue: https://github.com/psf/black/issues/2242 with the match/case syntax. I guess it might be a configuration issue on my end then.

Edit: An error shown in this discussion seems to not address match case, was it fixed later?

Udayraj123 avatar Mar 29 '23 20:03 Udayraj123