`reuse annotate` on gettext PO files
While trying to upgrade to reuse-tool 2.0 (before noticing it got yanked), we ran into several issues with the snippets detection. I wanted to annotate the files to avoid detection, but annotating is probably based on that as well.
Instead of adding comments on top of the file, it picked one of the fuzzy strings and added license instead of that, and removed all gettext special comments:
@@ -32508,9 +32538,11 @@ msgid ""
"and follows `REUSE 3.0 specification <https://reuse.software/>`_."
msgstr ""
-#: ../../../README.rst:72 ../../contributing/license.rst:7
-#, fuzzy
-#| msgid "Copyright © 2012–2021 Michal Čihař [email protected]"
+# Copyright © 2012–2021 Michal Čihař [email protected]"
+# Copyright © Michal Čihař <[email protected]>
+#
+# SPDX-License-Identifier: GPL-3.0-or-later
+
msgid "Copyright © Michal Čihař [email protected]"
msgstr "Copyright © 2012–2021 Michal Čihař [email protected]"
Is it expected that annotate looks for snippets in the middle of the file? That makes it incredibly slow on such files.
Can annotate be aware of the gettext PO syntax and avoid destroying it?
Before reuse annotate: https://github.com/WeblateOrg/weblate/blob/c9e2bb29238ec7fe7fb48aa46ae816ab0ecba09a/docs/locales/fr/LC_MESSAGES/docs.po#L32511-L32515
After reuse annotate (and other changes): https://github.com/WeblateOrg/weblate/blob/f4b17ca9169db46973b9228e2270d0ba582cd510/docs/locales/fr/LC_MESSAGES/docs.po#L32541-L32547
oof.
This will require a little more attention than I have time for now. But this obviously shouldn't happen.
The problem is probably that reuse is rather naïve in finding the file header. It needs to be naïve, because it tries to support a lot of languages, and we don't have a robust library to deal with all manner of languages and comment styles.
So you get weird stuff like this.
Regarding looking for snippets in the whole file, #699 may have introduces that. It was meant for lint/spdx but not for annotate.
This doesn't concern an SPDX snippet though, I don't think.
Seems that annotate behavior hasn't changed, it behaves this way on older releases as well.
https://github.com/fsfe/reuse-tool/pull/699 just made me run into this issue because it started to detect other license in the file than we had in the dep5 file. It was actually useful in this case because it made me remove some bogus comments we had in these files…