ruff Feature request: Add Length-Sort config to isort

Thanks for an awsome library. I am currently trying to replace the old flake8 and isort setup on our repo with ruff.

We can almost replace isort with ruff, but we are currently using the Length-Sort configuration from isort, which currently can't be found in ruff.

Jan 02 '23 22:01 einarwar

Love the Annoy-o-Tron.

Jan 02 '23 22:01 charliermarsh

We also need this to be able to migrate to ruff!

Feb 01 '23 14:02 RobertCraigie

Hi @charliermarsh 👋🏼 I've started learning Rust for a while but haven't contributed yet and want to start from somewhere. So if you allow can I pick this one ?

Apr 08 '23 18:04 mihirsn

Hi @charliermarsh, just wanted to know that if you have any idea for the implementation approach for this..if so could let me know I am thinking to pick this up. I saw the comment, a bit confused by it.. so if you can elaborate it would be great.

Apr 14 '23 04:04 mihirsn

Hi there :wave: I will try to implement this option, but I have a question about the already implemented method. Do I understand correctly that order_by_type and length-sort are mutually exclusive options? If so, should I add an enum with the options?

Jun 01 '23 15:06 vadim-su

While you're at supporting length-sort, please also support length-sort-straight. That's probably the biggest isort feature I rely on that is missing from ruff.

Jun 08 '23 17:06 warsaw

Do I understand correctly that order_by_type and length-sort are mutually exclusive options? If so, should I add an enum with the options?

I don't know the answer to that. My guess is that they are not mutually exclusive since different isort options control them, but I don't know how they interact if both are set.

For length-sort and length-sort-straight my guess is that the latter is a subset of the former, but I'm not sure how they interact if say the former were true and the latter were false.

Jun 17 '23 19:06 warsaw

Hey! I'm currently working on this one and I have a working implementation of the basic functionality. @suharnikov if you're still working on it as well I'll leave it for you of course :)

I do have one question though, I couldn't help but notice that there are a bunch of functions that take in a whole lot of isort settings (like format_imports in rules/isort/mod.rs), and I was wondering why they don't just take in a reference to an isort::settings::Settings? Is there a particular reason for that? I tried changing the function signatures and it seems to work alright that way too. Having all the settings passed around separately yields a fair amount of tedious work when you're trying to add a new setting because you have to chase down all the places that they need to be added and change all the function signatures. But if there's a good reason that I just haven't figured out I'd love to know about it 🙂

Sep 25 '23 17:09 bluthej

@bluthej - I think it's probably okay to refactor those methods to accept Settings. If you're able to work on that, do you mind putting it up as a standalone PR, with just that refactor, to make it easier to review and merge?

Sep 25 '23 17:09 charliermarsh

@charliermarsh absolutely! That's no problem at all :)

Then I'll start with the refactor and come back to this one when it's merged 😊

Sep 25 '23 19:09 bluthej

#7963 is merged, which should pave the way for this.

Oct 30 '23 04:10 charliermarsh

Hey, so I got back to working on this, and that led me to the following question.

When using length-sort, isort will count the dots of relative imports in the length instead of sorting by distance and then length of the module name itself (without the dots). Is that the behavior we want? Or do we want to first sort based on distance and then length of module name?

Nov 02 '23 07:11 bluthej

Note: We should clarify whether the length means number of characters or the unicode width. Or are unicode names not supported by python (whether they're discouraged is another question ;))

Nov 02 '23 07:11 MichaReiser

That's a fair point!

On my system (Python 12 on Arch Linux) I cannot seem to be able to import a module with a non-ASCII character, I get a module not found error. However importing members with non-ASCII characters works. Now isort uses the built-in len, which yields the number of characters, not the number of bytes or whatever.

I guess we can use .chars().count() to get the number of characters for members and .len() for modules (I assume the len method is faster because it doesn't iterate over the characters, it just accesses the length stored in the str/String). We might not sort modules properly if there are non-ASCII characters in some of them but as far as I can tell that means the input is not valid Python so it's probably worth the speedup.

Nov 02 '23 17:11 bluthej

Now isort uses the built-in len, which yields the number of characters

What’s a “character”? I think len does codepoints and not, say, grapheme clusters …

For a more accurate length, I think we’d have to iterate grapheme clusters and check which ones are likely to render at double width. (Don’t let yourself be confused by “wide character” sometimes meaning storage size and sometimes literal width on screen)

E.g. check out how my terminal renders “平”, which of course is a valid Python identifier:

Nov 03 '23 10:11 flying-sheep

What’s a “character”? I think len does codepoints and not, say, grapheme clusters …

According to the Python docs:

Since Python 3.0, the language’s str type contains Unicode characters

and they add:

The Unicode standard describes how characters are represented by code points

So you are correct, len does codepoints. The Rust equivalent would be .chars().count() right?

For a more accurate length, I think we’d have to iterate grapheme clusters

I wouldn't necessarily say "more accurate", but the argument can be made that it's more "natural" for humans. The downside is that it would make ruff behave differently from isort, so there's a bit of a tradeoff.

E.g. check out how my terminal renders “平”, which of course is a valid Python identifier:

Interesting! I'm going to have to figure out why my example didn't work :)

Nov 03 '23 17:11 bluthej

I would prefer that we use Unicode width, which is what we use everywhere else (even though we use the term "length" -- we nearly renamed them all when we released the formatter but decided to defer that decision).

Nov 03 '23 17:11 charliermarsh

(You should be able to call .width() on any string using the use unicode_width::UnicodeWidthChar trait.)

Nov 03 '23 17:11 charliermarsh

That’s exactly the concept I was talking about btw :smiley:

Nov 03 '23 18:11 flying-sheep

(You should be able to call .width() on any string using the use unicode_width::UnicodeWidthChar trait.)

I think you meant the unicode_width::UnicodeWidthStr trait for strings :) I switched to .width() and added some tests with non-ASCII characters.

What about relative imports? Should we follow isort and count the dots in the length?

Nov 04 '23 10:11 bluthej

ruff ruff copied to clipboard

Feature request: Add Length-Sort config to isort

ruff
ruff copied to clipboard