reuse-tool icon indicating copy to clipboard operation
reuse-tool copied to clipboard

`reuse annotate` on gettext PO files

Open nijel opened this issue 2 years ago • 4 comments

While trying to upgrade to reuse-tool 2.0 (before noticing it got yanked), we ran into several issues with the snippets detection. I wanted to annotate the files to avoid detection, but annotating is probably based on that as well.

Instead of adding comments on top of the file, it picked one of the fuzzy strings and added license instead of that, and removed all gettext special comments:

 @@ -32508,9 +32538,11 @@ msgid ""
 "and follows `REUSE 3.0 specification <https://reuse.software/>`_."
 msgstr ""
 
-#: ../../../README.rst:72 ../../contributing/license.rst:7
-#, fuzzy
-#| msgid "Copyright © 2012–2021 Michal Čihař [email protected]"
+# Copyright © 2012–2021 Michal Čihař [email protected]"
+# Copyright © Michal Čihař <[email protected]>
+#
+# SPDX-License-Identifier: GPL-3.0-or-later
+
 msgid "Copyright © Michal Čihař [email protected]"
 msgstr "Copyright © 2012–2021 Michal Čihař [email protected]"
 

Is it expected that annotate looks for snippets in the middle of the file? That makes it incredibly slow on such files.

Can annotate be aware of the gettext PO syntax and avoid destroying it?

Before reuse annotate: https://github.com/WeblateOrg/weblate/blob/c9e2bb29238ec7fe7fb48aa46ae816ab0ecba09a/docs/locales/fr/LC_MESSAGES/docs.po#L32511-L32515

After reuse annotate (and other changes): https://github.com/WeblateOrg/weblate/blob/f4b17ca9169db46973b9228e2270d0ba582cd510/docs/locales/fr/LC_MESSAGES/docs.po#L32541-L32547

nijel avatar Jun 27 '23 11:06 nijel

oof.

This will require a little more attention than I have time for now. But this obviously shouldn't happen.

The problem is probably that reuse is rather naïve in finding the file header. It needs to be naïve, because it tries to support a lot of languages, and we don't have a robust library to deal with all manner of languages and comment styles.

So you get weird stuff like this.

carmenbianca avatar Jun 27 '23 11:06 carmenbianca

Regarding looking for snippets in the whole file, #699 may have introduces that. It was meant for lint/spdx but not for annotate.

mxmehl avatar Jun 27 '23 12:06 mxmehl

This doesn't concern an SPDX snippet though, I don't think.

carmenbianca avatar Jun 27 '23 12:06 carmenbianca

Seems that annotate behavior hasn't changed, it behaves this way on older releases as well.

https://github.com/fsfe/reuse-tool/pull/699 just made me run into this issue because it started to detect other license in the file than we had in the dep5 file. It was actually useful in this case because it made me remove some bogus comments we had in these files…

nijel avatar Jun 27 '23 12:06 nijel