pre-commit-hooks icon indicating copy to clipboard operation
pre-commit-hooks copied to clipboard

insert-license detection does not ignore spaces after comment symbols

Open XuehaoSun opened this issue 1 year ago • 5 comments

Take python as an example # Copyright... and # Copyright... It doesn't consider the two pieces of code to be the same, so it doesn't detect # Copyright...,just because it has two spaces after # The result is that it will automatically add the license again.

XuehaoSun avatar Aug 02 '23 05:08 XuehaoSun

I agree that this is annoying.

Have you tried to fuzzy-match your license? cf. https://github.com/Lucas-C/pre-commit-hooks#fuzzy-license-matching

Lucas-C avatar Aug 02 '23 10:08 Lucas-C

Of course, but that would make me have to delete the TODO for each file after running. I'd prefer to be able to automatically skip when a match is reached, rather than forcing the TODO to be inserted, because I don't think the space indentation issue is worth it Fix it individually. So, I ended up choosing --skip-license-insertion-comment to avoid it from being automatically inserted, but this will cause --use-current-year fail, which is obviously not as reasonable as using fuzzy matching to skip.

XuehaoSun avatar Aug 03 '23 01:08 XuehaoSun

So, I ended up choosing --skip-license-insertion-comment to avoid it from being automatically inserted, but this will cause --use-current-year fail, which is obviously not as reasonable as using fuzzy matching to skip.

Does that mean that your problem is solved?

Otherwise, it's not clear to me what solution you suggest?

Lucas-C avatar Aug 03 '23 18:08 Lucas-C

I think it is more convenient to judge whether to skip insert by fuzzy matching rate. But this feature has not been implemented yet, so I can only use --skip-license-insertion-comment as a substitute for it.

XuehaoSun avatar Aug 04 '23 01:08 XuehaoSun

I typically use a Python style format tool like black or similar, meaning I rarely have to worry about # Copyright... versus # Copyright... which is nice. That said it seems practical to ignore any space(s) (or other whitespace like tabs?) after the # for matching the license text.

The way I was expecting this to work would be the comparison is done on the comment block from the file with the comment syntax removed, which might be harder than it seems with assorted different commenting syntax configurations. However, in fact looking at the code, the comparison is done on the actual file contents versus the expected license block with the configured comment marker and one space.

See https://github.com/Lucas-C/pre-commit-hooks/blob/v1.5.5/pre_commit_hooks/insert_license.py#L167 which inserts one space when preparing the expected license block, not just for inserting into the file if missing, but also used for finding the license: https://github.com/Lucas-C/pre-commit-hooks/blob/v1.5.5/pre_commit_hooks/insert_license.py#L549

i.e. The simplest way I can see to fix this is to build a regular expression of "{opening comment}{at least one space}{line of license}" and use that in the search?

peterjc avatar Sep 02 '24 15:09 peterjc