ruff
ruff copied to clipboard
Feature request: Add Length-Sort config to isort
Thanks for an awsome library. I am currently trying to replace the old flake8 and isort setup on our repo with ruff.
We can almost replace isort with ruff, but we are currently using the Length-Sort configuration from isort, which currently can't be found in ruff.
Love the Annoy-o-Tron.
We also need this to be able to migrate to ruff!
Hi @charliermarsh 👋🏼 I've started learning Rust for a while but haven't contributed yet and want to start from somewhere. So if you allow can I pick this one ?
Hi @charliermarsh, just wanted to know that if you have any idea for the implementation approach for this..if so could let me know I am thinking to pick this up. I saw the comment, a bit confused by it.. so if you can elaborate it would be great.
Hi there :wave:
I will try to implement this option, but I have a question about the already implemented method.
Do I understand correctly that order_by_type
and length-sort
are mutually exclusive options? If so, should I add an enum with the options?
While you're at supporting length-sort
, please also support length-sort-straight
. That's probably the biggest isort feature I rely on that is missing from ruff.
Do I understand correctly that order_by_type and length-sort are mutually exclusive options? If so, should I add an enum with the options?
I don't know the answer to that. My guess is that they are not mutually exclusive since different isort options control them, but I don't know how they interact if both are set.
For length-sort
and length-sort-straight
my guess is that the latter is a subset of the former, but I'm not sure how they interact if say the former were true and the latter were false.
Hey! I'm currently working on this one and I have a working implementation of the basic functionality. @suharnikov if you're still working on it as well I'll leave it for you of course :)
I do have one question though, I couldn't help but notice that there are a bunch of functions that take in a whole lot of isort
settings (like format_imports
in rules/isort/mod.rs
), and I was wondering why they don't just take in a reference to an isort::settings::Settings
? Is there a particular reason for that? I tried changing the function signatures and it seems to work alright that way too.
Having all the settings passed around separately yields a fair amount of tedious work when you're trying to add a new setting because you have to chase down all the places that they need to be added and change all the function signatures.
But if there's a good reason that I just haven't figured out I'd love to know about it 🙂
@bluthej - I think it's probably okay to refactor those methods to accept Settings
. If you're able to work on that, do you mind putting it up as a standalone PR, with just that refactor, to make it easier to review and merge?
@charliermarsh absolutely! That's no problem at all :)
Then I'll start with the refactor and come back to this one when it's merged 😊
#7963 is merged, which should pave the way for this.
Hey, so I got back to working on this, and that led me to the following question.
When using length-sort, isort will count the dots of relative imports in the length instead of sorting by distance and then length of the module name itself (without the dots). Is that the behavior we want? Or do we want to first sort based on distance and then length of module name?
Note: We should clarify whether the length
means number of characters or the unicode width. Or are unicode names not supported by python (whether they're discouraged is another question ;))
That's a fair point!
On my system (Python 12 on Arch Linux) I cannot seem to be able to import a module with a non-ASCII character, I get a module not found error. However importing members with non-ASCII characters works. Now isort uses the built-in len
, which yields the number of characters, not the number of bytes or whatever.
I guess we can use .chars().count()
to get the number of characters for members and .len()
for modules (I assume the len
method is faster because it doesn't iterate over the characters, it just accesses the length stored in the str
/String
). We might not sort modules properly if there are non-ASCII characters in some of them but as far as I can tell that means the input is not valid Python so it's probably worth the speedup.
Now isort uses the built-in
len
, which yields the number of characters
What’s a “character”? I think len
does codepoints and not, say, grapheme clusters …
For a more accurate length, I think we’d have to iterate grapheme clusters and check which ones are likely to render at double width. (Don’t let yourself be confused by “wide character” sometimes meaning storage size and sometimes literal width on screen)
E.g. check out how my terminal renders “平”, which of course is a valid Python identifier:
What’s a “character”? I think
len
does codepoints and not, say, grapheme clusters …
According to the Python docs:
Since Python 3.0, the language’s str type contains Unicode characters
and they add:
The Unicode standard describes how characters are represented by code points
So you are correct, len
does codepoints. The Rust equivalent would be .chars().count()
right?
For a more accurate length, I think we’d have to iterate grapheme clusters
I wouldn't necessarily say "more accurate", but the argument can be made that it's more "natural" for humans. The downside is that it would make ruff behave differently from isort, so there's a bit of a tradeoff.
E.g. check out how my terminal renders “平”, which of course is a valid Python identifier:
Interesting! I'm going to have to figure out why my example didn't work :)
I would prefer that we use Unicode width, which is what we use everywhere else (even though we use the term "length" -- we nearly renamed them all when we released the formatter but decided to defer that decision).
(You should be able to call .width()
on any string using the use unicode_width::UnicodeWidthChar
trait.)
That’s exactly the concept I was talking about btw :smiley:
(You should be able to call
.width()
on any string using theuse unicode_width::UnicodeWidthChar
trait.)
I think you meant the unicode_width::UnicodeWidthStr
trait for strings :)
I switched to .width()
and added some tests with non-ASCII characters.
What about relative imports? Should we follow isort and count the dots in the length?