securedrop icon indicating copy to clipboard operation
securedrop copied to clipboard

Licensing notices are used in different ways in different repositories

Open gonzalo-bulnes opened this issue 2 years ago • 8 comments

Description

Currently, we have different practices regarding license notices in different parts of the SecureDrop project.

  • SecureDrop Client uses full AGPL-3.0-or-later headers in each file (example)
  • SecureDrop servers don't display them per-file (example)
  • I guess two examples are enough to show the practice is different?

There are arguments for and against licensing notices placement and content, that I'll do my best to recap below.

While this is not an urgent topic, this issue is meant to collect thoughts, so that we can make a conscious decision about what our preferred practice is.

gonzalo-bulnes avatar Jan 10 '22 22:01 gonzalo-bulnes

As far as I understand:

  • The Affero GNU Public License encourages to provide a notice in each file alongside the copyright line. [1]

  • The copyright line is somewhat separate to the licensing notice and has stricter formatting requirements. [1]

Arguments

  • Per-file notices are the least ambiguous
  • Per-file notices are the easiest to find when re-using code [2]
  • Per-file notices can be longer than the code they document
  • Per-file notices can feel tedious (repetitive)
  • Per-file notices mean that moving code to their own file is sometimes not recognized as a move (vs file rewrite) by GitHub (my very specific itch right now)
  • Per-file notices can be short when used in conjunction with project/package full notices. (example)

As far as I know, there are no strong arguments for longer vs shorter notices.

Am I missing anything @legoktm @eloquence @conorsch ?

gonzalo-bulnes avatar Jan 10 '22 22:01 gonzalo-bulnes

At this point, I believe that a short per-file notice + copyright line(s) would fulfill the re-usability goals while mitigating most of the perceived inconvenience, so that would be my personal preference. (Not feeling strongly about it, though.)

Using the SPDX license identifiers seems like a good idea to me, because they allow automation while being readable (YMMV).

For the copyright lines, my understanding is that the word "copyright" MUST be used, that the symbol doesn't hurt but is not mandatory, and the years where the code was releases MUST be present too (using a range is OK where accurate). [1]

Example for a Python file:

# SPDX-License-Identifier: AGPL-3.0-or-later
# Copyright © YYYY-YYYY The Freedom of the Press Foundation.
from foo import bar

# ...

gonzalo-bulnes avatar Jan 10 '22 23:01 gonzalo-bulnes

I personally favor repo-level LICENSE files over per-file notices, but don't feel very strongly one way or another. Of course, if a repo combines multiple licenses directly then it's useful to have file-level notices, but AFAIK that doesn't apply to existing repos impacted here.

IMO the argument that per-file notices (as opposed to a repo-level LICENSE file) help with correct re-use lacks empirical evidence. GitHub makes the license information visible at the repo-level, which seems largely sufficient:

  • People who attempt to behave in a license-compliant manner will typically know to look there or in the LICENSE file itself.
  • Those who do not care about correct re-use will happily copy/paste only the code they care about or even intentionally strip notices.

Adding a short per-file license string seems to have an educational value of approximately zero (nobody who doesn't already know what it means will be enlightened by it), but may help trace a file's origin/license if it happens to end up being copied around carelessly. Whether that's worth a set of commits touching every file of code in every impacted repo and ensuring that practice is maintained going forward...

eloquence avatar Jan 10 '22 23:01 eloquence

Whether that's worth a set of commits touching every file of code in every impacted repo and ensuring that practice is maintained going forward...

Good observation. This is a typical example of change I would not go an make across the board. Instead, I'd advocate for making a decision, then applying it as we get opportunities to touch the files. YMMV

gonzalo-bulnes avatar Jan 10 '22 23:01 gonzalo-bulnes

My preference is for per-file license information to facilitate copying/reuse in both directions, us wanting to copy stuff and others wanting to copy our stuff. I've done enough complete copyright audits (for Debian mostly) and gotten burned by people putting down vague licenses leading to me trawling through the Wayback Machine to find a contact point to get a real license statement that I just prefer things to be as explicit as possible and want code I work on to reflect that best practice.

In my experience there's a third category of people: developers who don't really understand licenses (because free software licensing and copyright in general are incredibly complicated!), but are acting in good faith. This tends to be the group I encounter the most, though I only have anecdotal evidence for that.

And... with the short SPDX tags per-file licensing becomes pretty unobtrusive and easy to do so that I think it's worth it.

legoktm avatar Jan 12 '22 19:01 legoktm

In my experience there's a third category of people: developers who don't really understand licenses (because free software licensing and copyright in general are incredibly complicated!), but are acting in good faith. This tends to be the group I encounter the most, though I only have anecdotal evidence for that.

At a way smaller scale, that's my impression/experience too. 👍 (And I include myself in that group at times, despite my strong interest in the topic.)

gonzalo-bulnes avatar Jan 12 '22 20:01 gonzalo-bulnes

I'm adding a +1 for per-repo/ per-library licensing information versus per-file for several reasons:

  • Per-file means more maintenance for us. Think of how many files we're constantly adding, moving, removing, and updating. If we were to move towards a per-file format, let's add some automation first.
  • Sometimes, it doesn't make much sense to add copyright or licensing information, e.g. when a file is empty or a single line of code (it'd be kind of funny though)
  • I don't buy the argument that a two-line code comment, such as the one suggested, would help developers know their rights and responsibilities when using our code. A header could help if it linked our LICENSE file, which include copyright and educational links, e.g. https://opensource.guide/legal/: copyright-and-license...
  • If we want to encourage people to reuse our code, we could add more information in the README section on licensing, e.g. https://github.com/freedomofpress/securedrop#license, that makes it easier to understand how to properly use our code.

(Also see past discussion: https://github.com/freedomofpress/securedrop/issues/6219)

sssoleileraaa avatar Feb 15 '22 02:02 sssoleileraaa

The two-line code comment I'm referring to is:

   # SPDX-License-Identifier: AGPL-3.0-or-later
   # Copyright © 2022 The Freedom of the Press Foundation.

sssoleileraaa avatar Feb 15 '22 02:02 sssoleileraaa

@kushaldas, any recommendations on per-repo/ per-library licensing information versus per-file, and wording?

sssoleileraaa avatar Sep 26 '22 16:09 sssoleileraaa

@kushaldas, any recommendations on per-repo/ per-library licensing information versus per-file, and wording?

We can follow the simple SPDX suggestion of adding per file copyright/license information, for all the Python files to start with. We also need to have the standard LICENSE file. I remember @anweshadas started writing down in a branch, maybe she can create a PR in the coming days with filling in the information.

kushaldas avatar Sep 26 '22 16:09 kushaldas

sounds great. @anweshadas: feel free to at-mention me on a PR for review if you'd like :)

sssoleileraaa avatar Sep 28 '22 19:09 sssoleileraaa

We can follow the simple SPDX suggestion of adding per file copyright/license information, for all the Python files to start with. We also need to have the standard LICENSE file.

I'm glad to see we've reached broader agreement the how! :slightly_smiling_face: . For consistency, I'll take that decision as a way forward in the securedrop-client where we raised a similar issue.

gonzalo-bulnes avatar Sep 28 '22 22:09 gonzalo-bulnes

My 2 cents: I also favour repo level standard LICENSE files… but also think that 2 line comments are fairly unobtrusive. That being said, touching all the files to add the comments is icky as it resets valuable at-a-glance information during repo browsing, but doing it opportunistically would mean we might spend years to catch up, because we have a lot of stuff we don't touch that often.

If I had to order my preferences it'd be:

  1. Treat standardised LICENSE file as "catch-all", plus SPDX comments for exceptions (kind of like the material design files in this commit)
  2. SPDX comments for all relevant files and adding them in one go
  3. SPDX comments for all relevant files but adding them opportunistically as development allows

eaon avatar Nov 04 '22 20:11 eaon