ruff
ruff copied to clipboard
Consider using `libcst`
https://github.com/Instagram/LibCST
It is partially written in Rust and allows you to have way more over the syntax tree. It will allow code rewrite and will allow more syntax checks.
I highly recommend trying this before the project will have a lot of exisiting source code.
It seems like the author was looking into this in #25. A problem with LibCST is that it's significantly slower than the AST parser.
As a long time maintainer of different flake8
tools - no one ever complained about performance (however, it is 100% important), but people do complain about missing features such as code auto-fixing, minor style details that are not possible to get right with ast
+ tokenize
, etc.
This is my opinion :)
I definitely want to look into LibCST! The AST is more complicated so I have to rewrite the traversal code, but that's mostly mechanical. It may be the case that a more expensive parse is actually worth it -- LibCST is super powerful and would immediately enable auto-fixing and other features. It's also possible that we could use LibCST only for autofixing, in which case, we wouldn't have to parse every file.
LibCST would totally have longer parsing time, but it might also help when working with regex (I haven't seen your project - but I speak from my experience with pyflakes
, pydocstyle
, and others). There are lots of checks that are implemented with regex
(which can be very slow on large inputs). But, LibCST will make most of them absolete.
So, probably we can win some performance during the execution :)
(Very flattering that you even care to comment here. You've contributed to some of my favorite tools. Appreciate the engagement! Same for you @jack1142!)
As a long time maintainer of different flake8 tools - no one ever complained about performance
It depends a lot on the size and count of files you lint but when running flake8 as pre-commit hook, the delay can sometimes be noticeable.
However if someone, hypothetically, decided to refactor flake8 to use LibCST instead of ast
module (leaving the code as pure Python), it would likely be way slower so that's where my worry stems from. Maybe it's not going to be a problem when you skip the Python VM entirely by writing everything in Rust with no pure Python code but it's hard to tell without more testing.
It may worth mentioning that the Rust parser implementation of LibCST requires setting the environment variable LIBCST_PARSER_TYPE
to native
to activate, otherwise it defaults to the pure Python parser implementation, which is slower.
This important knowledge is not explicitly mentioned anywhere in LibCST's README or document, so I guess I may want to point it out. Please kindly take this into consideration when using and examining the performance aspect of LibCST.
As a recent contributor to LibCST I am really excited at ruff and its potential synergy with LibCST. Looking forward to see more to come! 🎉
I'm tracking the exploration in #89. That branch implements F634 (the ~simplest AST-based lint check) using LibCST. It is much slower than the current version, but I'd still like to see how difficult it is to layer autofix on top of that foundation.
I'm still interested in using LibCST to power autofixing specifically: find errors + erroring statements via the current strategy, then pass snippets to LibCST to correct.
We're now using LibCST to power some of the autofix behavior 🚀 🚀 🚀