pre-commit-hooks icon indicating copy to clipboard operation
pre-commit-hooks copied to clipboard

`file-contents-sorter` Not working same way everytime

Open teksturi opened this issue 2 years ago • 10 comments

I get strange issue that file-contents-sorter hook does not work same way everytime. I made little test case which trickers the issue. If i run this 10 times it will sometimes pass and sometimes not. I did not have time to look what problem might be.

Add this to file_contents_sorter_test.py

(
    b'pre\nPre\n',
    ['--unique', '--ignore-case'],
    PASS,
    b'pre\nPre\n',
),

And run it with for run in {1..10}; do pytest tests/file_contents_sorter_test.py; done This gives me that about 30% pass and 70% fails

teksturi avatar Jul 29 '22 11:07 teksturi

yeah this set call breaks unique sorting -- it should probably just forbid the combination of unique and ignore case since the result is undefined https://github.com/pre-commit/pre-commit-hooks/blob/v4.3.0/pre_commit_hooks/file_contents_sorter.py#L36

asottile avatar Jul 29 '22 11:07 asottile

@asottile please review if you have time. I'm hoping this PR is simple enough and goes with the spirit of forbidding the combinations of options as you mentioned. Also, I'd appreciate if you could label w/ hacktoberfest-accepted so I can get a tree planted, thanks.

renegaderyu avatar Oct 19 '22 16:10 renegaderyu

I'm not going to review something which doesn't pass tests

asottile avatar Oct 19 '22 16:10 asottile

@asottile Apologies for not seeing the failing tests before asking. I think its ready now.

renegaderyu avatar Oct 20 '22 15:10 renegaderyu

yeah this set call breaks unique sorting -- it should probably just forbid the combination of unique and ignore case since the result is undefined v4.3.0/pre_commit_hooks/file_contents_sorter.py#L36

Could we respect the original word case if the uncased words are equal? For example, let Pre always proceed pre because it's upper case. This can be easily implemented with tuple-based sort keys.

def key(s):
    return (s.lower(), s)

ret = sorted(set(word_list), key=key)

Another option is to respect the original order (stable sort). This can use an "OrderedSet" rather than set.

def unique(iterable):
    return list(OrderedDict.fromkeys(iterable))  # can be replaced with `dict` for Python 3.7+

Both approaches I list above are deterministic.

XuehaiPan avatar Mar 27 '23 08:03 XuehaiPan

no, it should forbid the combination as I said already

On Mon, Mar 27, 2023, 4:24 AM Xuehai Pan @.***> wrote:

yeah this set call breaks unique sorting -- it should probably just forbid the combination of unique and ignore case since the result is undefined v4.3.0/pre_commit_hooks/file_contents_sorter.py#L36 https://github.com/pre-commit/pre-commit-hooks/blob/v4.3.0/pre_commit_hooks/file_contents_sorter.py?rgh-link-date=2022-07-29T11%3A55%3A28Z#L36

Could we respect the original word case if the uncased words are equal? For example, let Pre always proceed pre because it's upper case. This can be easily implemented with tuple-based sort keys.

def key(s): return (s.lower(), s) ret = sorted(set(word_list), key=key)

Another option is to respect the original order (stable sort). This can use an "OrderedSet" rather than set.

def unique(iterable): return list(OrderedDict.fromkeys(iterable)) # can be replaced with dict for Python 3.7+

— Reply to this email directly, view it on GitHub https://github.com/pre-commit/pre-commit-hooks/issues/794#issuecomment-1484715837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN2BH2U7K4LE7U7DJ5BN5TW6FFD3ANCNFSM55AOQ72Q . You are receiving this because you were mentioned.Message ID: @.***>

asottile avatar Mar 27 '23 12:03 asottile

To make this linter work properly with all type of line endings, we should change the behavior to this: image

Because currently the file-contents-sorter doesn't work consistently on different platforms. The file-contents-sorter currently works correctly only on platforms with \n line ending :(

Childcity avatar Sep 14 '23 20:09 Childcity

@Childcity that's unrelated, documented, and intentional

asottile avatar Sep 14 '23 20:09 asottile

@Childcity that's unrelated, documented, and intentional

So if I working on Windows and have to run file-contents-sorter in the Windows environment I can't do that!

I think, that file-contents-sorter should only sort lines (not changing its content). For fixing line-endings we have separate linter mixed-line-ending

Childcity avatar Sep 14 '23 20:09 Childcity

windows has supported posix newlines for decades

please stop replying here as it is off topic

asottile avatar Sep 14 '23 20:09 asottile