ruff icon indicating copy to clipboard operation
ruff copied to clipboard

Consider using `libcst`

Open sobolevn opened this issue 2 years ago • 6 comments

https://github.com/Instagram/LibCST

It is partially written in Rust and allows you to have way more over the syntax tree. It will allow code rewrite and will allow more syntax checks.

I highly recommend trying this before the project will have a lot of exisiting source code.

sobolevn avatar Sep 01 '22 16:09 sobolevn

It seems like the author was looking into this in #25. A problem with LibCST is that it's significantly slower than the AST parser.

Jackenmen avatar Sep 01 '22 16:09 Jackenmen

As a long time maintainer of different flake8 tools - no one ever complained about performance (however, it is 100% important), but people do complain about missing features such as code auto-fixing, minor style details that are not possible to get right with ast + tokenize, etc.

This is my opinion :)

sobolevn avatar Sep 01 '22 16:09 sobolevn

I definitely want to look into LibCST! The AST is more complicated so I have to rewrite the traversal code, but that's mostly mechanical. It may be the case that a more expensive parse is actually worth it -- LibCST is super powerful and would immediately enable auto-fixing and other features. It's also possible that we could use LibCST only for autofixing, in which case, we wouldn't have to parse every file.

charliermarsh avatar Sep 01 '22 17:09 charliermarsh

LibCST would totally have longer parsing time, but it might also help when working with regex (I haven't seen your project - but I speak from my experience with pyflakes, pydocstyle, and others). There are lots of checks that are implemented with regex (which can be very slow on large inputs). But, LibCST will make most of them absolete.

So, probably we can win some performance during the execution :)

sobolevn avatar Sep 01 '22 17:09 sobolevn

(Very flattering that you even care to comment here. You've contributed to some of my favorite tools. Appreciate the engagement! Same for you @jack1142!)

charliermarsh avatar Sep 01 '22 17:09 charliermarsh

As a long time maintainer of different flake8 tools - no one ever complained about performance

It depends a lot on the size and count of files you lint but when running flake8 as pre-commit hook, the delay can sometimes be noticeable.

However if someone, hypothetically, decided to refactor flake8 to use LibCST instead of ast module (leaving the code as pure Python), it would likely be way slower so that's where my worry stems from. Maybe it's not going to be a problem when you skip the Python VM entirely by writing everything in Rust with no pure Python code but it's hard to tell without more testing.

Jackenmen avatar Sep 01 '22 17:09 Jackenmen

It may worth mentioning that the Rust parser implementation of LibCST requires setting the environment variable LIBCST_PARSER_TYPE to native to activate, otherwise it defaults to the pure Python parser implementation, which is slower.

This important knowledge is not explicitly mentioned anywhere in LibCST's README or document, so I guess I may want to point it out. Please kindly take this into consideration when using and examining the performance aspect of LibCST.

As a recent contributor to LibCST I am really excited at ruff and its potential synergy with LibCST. Looking forward to see more to come! 🎉

MapleCCC avatar Sep 02 '22 11:09 MapleCCC

I'm tracking the exploration in #89. That branch implements F634 (the ~simplest AST-based lint check) using LibCST. It is much slower than the current version, but I'd still like to see how difficult it is to layer autofix on top of that foundation.

charliermarsh avatar Sep 02 '22 20:09 charliermarsh

I'm still interested in using LibCST to power autofixing specifically: find errors + erroring statements via the current strategy, then pass snippets to LibCST to correct.

charliermarsh avatar Sep 28 '22 21:09 charliermarsh

We're now using LibCST to power some of the autofix behavior 🚀 🚀 🚀

charliermarsh avatar Oct 03 '22 20:10 charliermarsh