ModelicaSpecification icon indicating copy to clipboard operation
ModelicaSpecification copied to clipboard

Preserve top-of-file SPDX headers (e.g., REUSE compliance)

Open gwr-de opened this issue 8 months ago • 21 comments

Context

The REUSE Specification mandates that every source file carry its SPDX-license header “at the very top of the file” (aside from she-bangs or XML prologs). Modelica tools today to my knowledge treat comments as non-semantic trivia and may move or drop them when reformatting, breaking REUSE compliance.

Issue

  1. Headers get moved or lost. SPDX tags before within …; can be relocated or removed by code generators/formatters.
  2. Side-cars aren’t enough. Relying on annotations, HTML docs, or external license files doesn’t satisfy REUSE’s in-file requirement, e.g., #3611 .

Proposal

Spec text

Require that every .mo file begin—with only an optional she-bang/XML prolog before it—with a REUSE-compliant SPDX comment block. Tools must preserve this block verbatim and refuse to relocate or strip it.

Tooling guidance

Reference implementations should detect and lock the top comment block in place, and disable reformatting that would move it.

Clarification

Explicitly state that annotations or out-of-band docs do not replace an in-file header.

gwr-de avatar Apr 25 '25 12:04 gwr-de

This needs more thought, and I'm not sure if it is a good idea:

  • We have already added license files in a standardized place ( #2900 ), since many Modelica packages has part using different licenses - those links are not necessarily file-based
  • It is not required to store Modelica code in source files (and we don't always do it)
  • Similarly Modelica tools are often designed to show the Modelica classes in the package; so the idea with having it in source files that aren't seen by users seems like a step backwards.
  • There are also encrypted packages (both vendor-specific ones and standardized ones) where you cannot easily look at the source files. I know they are rarely under open-source licenses which shows that we need more than SPDX.
  • The main issue I see with #3611 is how to ensure that information is correct. E.g., if you copy (and modify) a class it is a derivative work - so it is not necessarily under the same license, and one reason we use a BSD-license for MSL is that it is easy to have a more restrictive license for derived work.

HansOlsson avatar Apr 25 '25 13:04 HansOlsson

I'm not convinced—here's why:

  • One file, one class. Almost every non-trivial library maps one class/type per .mo file.
  • Tool access. Developers work in IDEs and CLIs, not just proprietary Modelica platforms.
  • Legal granularity. The smallest unit those tools see is the file itself, so license / copyright info belongs there.
  • Proven solution. Software projects everywhere have already solved how to include accurate, even multi-license headers in each file—my point is that even a single-license header needs the same in-file clarity, independent of any external tooling.

Encrypted bundles or GUI-only components can use manifests or dialogs, but that’s an edge case. For normal, open-source .mo libraries, front-loaded headers are by far the simplest way to keep license info both human- and machine-friendly.

gwr-de avatar Apr 25 '25 13:04 gwr-de

  • From a legal perspective, the clarification of which license and which copyright applies belongs into the smallest "physical" unit of storage accessible by CLIs/IDEs and that's a file.

Well,

  • As previously stated: not all IDEs for Modelica work with files
  • Most licenses don't require that the license is included in each source file (even if some common ones recommend it) https://softwareengineering.stackexchange.com/questions/317041/should-i-add-the-license-in-every-header-and-source-file

Thinking more I believe adding the copyright in all source files would be problematic, since it gives the impression the copyright notice is needed in the files to protect them, or the even worse idea that code without copyright notice isn't protected by copyright. Note that many Modelica libraries also contain compiled libraries, images, and data-files (etc). Those binary files are also covered by the same copyright as the rest of the package (unless otherwise specified), due to how copyright works (according to the Berne convention with updates like Uruguay).

However, I am not a lawyer.

HansOlsson avatar Apr 25 '25 14:04 HansOlsson

Hans, thanks for digging up that old discussion—but it’s from over eight years ago (“an eternity” in software engineering!). In the meantime, best practices have evolved. For example, the GNU GPL FAQ still recommends including a license notice in each source file:

“You should place a copyright notice and license notice in each source file, to make it clear what it is covered by.” https://www.gnu.org/licenses/gpl-faq.en.html#NoticeInSourceFile

Here’s why per-file headers matter from a developer’s perspective (also note my edited previous answer):

  • Effortless by design. I can add a single SPDX line at the top of a file without worrying about tools moving or stripping it. (simple to do using pre-commit hook)
  • File-level granularity. There’s no guarantee code is ever copied in big bundles—developers work one .mo file (one class) at a time in IDEs and CLIs.
  • Legal clarity. The smallest “change-and-editable” unit is the file itself, so license and copyright belong there, independent of any platform or doc generator.

In short, putting the header in every .mo gives developers a rock-solid, tool-agnostic way to keep license info correct and visible—exactly where it needs to be. Thoughts?

gwr-de avatar Apr 25 '25 14:04 gwr-de

Minimum requirement

As an open-source developer, I must be able to insert SPDX headers and related legal information—either in a block comment (/* … */) or as a series of line comments, so long as at least one line contains an SPDX header—into any .mo file, and those comments must remain exactly where I place them (no tool may move or delete them).

EDIT In other words:

Tools that re-serialize or format Modelica code MUST preserve any comment tokens exactly where they occur in the source text (i.e. must not treat comments as discardable whitespace).

gwr-de avatar Apr 25 '25 14:04 gwr-de

Here's an example what I usually do: https://github.com/modelica-3rdparty/ExternData/blob/38187b6eef5e0e2637198ae6ae3fbb967bb96bdb/ExternData/package.mo#L1-L8

beutlich avatar Apr 25 '25 15:04 beutlich

@beutlich Thanks. But is there a guarantee that the comment is a) pre-served and b) not relocated to a different position given that comments are treated as whitespace according to section 2.2? All I want is to be able to do exactly what you are doing even if I feel that it should be done for each file (as recommended by FSF).

gwr-de avatar Apr 25 '25 15:04 gwr-de

Well, at least there is some self-made guarantee, because I only edit Modelica files in text editor, try to have as few Modelica files as possible and manually check each commit for changes.

beutlich avatar Apr 25 '25 15:04 beutlich

I am using Wolfram System Modeler for development and per default everything either is one large file or gets separated into one file per class. A comment like yours before within ...; currently will be moved even if I originally saved it at the top of the .mo file.

TL;DR Self-made guarantees don’t stick—only Spec-level guarantees will. ;-)

gwr-de avatar Apr 25 '25 15:04 gwr-de

Section 18.13 makes it the developer’s sole responsibility to include required license texts. To reinforce this requirement, I believe it prudent to acknowledge the following Tooling requirements for comments:

  1. Developer freedom. A developer may insert SPDX headers and other legal notices in comments (blocks or line form) anywhere in a .mo file.
  2. Tool obligation. Any Modelica-compliant tool that outputs .mo files shall leave those comment blocks exactly where placed, without alteration or removal.
  3. Warning on non-compliance. If a tool cannot preserve comments (e.g., due to parsing limitations), it shall emit a warning and refuse to perform the re-serialization.

I therefore propose to amend Section 2.2 "Comments" of the Specification as follows:

After

There are two kinds of comments in Modelica which are not lexical units in the language and therefore are treated as white-space by a Modelica translator.

And before

The white-space characters are space, tabulator, and line separators ...

INSERT (Comment preservation)

Any tool that reads a Modelica source file and then writes it back out—whether to pretty-print, split into multiple .mo files, concatenate, or otherwise re-serialize—shall preserve every comment sequence (//… and //) verbatim at its original character-stream position. Comments shall not be moved, merged, split, or removed.


If there's consensus, should I open a pull request to implement these changes?

gwr-de avatar Apr 25 '25 18:04 gwr-de

Hans, thanks for digging up that old discussion—but it’s from over eight years ago (“an eternity” in software engineering!). In the meantime, best practices have evolved.

Different groups have different practices, and in this case it seems they are following the US pre-1976 copyright rules (that did require work to have a copyright notice to be protected by copyright); that's how slowly things sometimes move in software engineering.

A different practice is to add the license on GitHub.

Any tool that reads a Modelica source file and then writes it back out—whether to pretty-print, split into multiple .mo files, concatenate, or otherwise re-serialize—shall preserve every comment sequence (//… and //) verbatim at its original character-stream position. Comments shall not be moved, merged, split, or removed.

That is a really strict requirement, but for SPDX-comments you only need to preserve the ones at the start of the file, so why make it so general?

A problem with other comments is that IDEs allow changes (graphically or otherwise); so the "original character-stream position" can be meaningless after those changes - but all of those changes are changes inside the class, and thus not relevant for SPDX (And similarly splitting/concatenating isn't relevant for SPDX-comments; and not really how those restructuring operations work at all.)

However, the main issues are still:

  • Copyright apply to everything you write; so SPDX-comments aren't needed; but we can still have them
  • Not all tools/IDEs use source files
  • Making new classes (stored in new files) isn't a major operation in all tools
  • Non-source artifacts like images are also protected by licenses - without the header

HansOlsson avatar Apr 27 '25 17:04 HansOlsson

As they explain in the GNU FAQ, putting SPDX-headers and other legal comments into source files is not a legal requirement per se if all your files are copied together and say a LICENSE or a directory with LICENSES are included.

But — even with Modelica — the open source world is a world of “picking and using” whatever you need and the FSF/REUSE recommendation targets the future — not the present — state of files in a project, where copyright notices and licenses may have gotten separated from individual files. The easiest and most straight-forward way of avoiding any confusion and misconception is putting SPDX and similar machine readable commentary at the beginning of every source file (however classes are mapped to files).

BTW the recommendation for a binary file somebinary is to add a somebinary.license file right next to it with similar information as contained in the header block.

gwr-de avatar Apr 27 '25 21:04 gwr-de

Let’s narrow the requirement to exactly what matters for license headers then:

Comment preservation (SPDX only). Any tool that re-serializes a .mo file—whether to pretty-print, split, concatenate, or export—shall preserve the very first contiguous block of comments (line or block form) immediately before the first non-shebang/XML token (typically the within clause) verbatim. All other comments may be handled as whitespace.

This refined wording covers the SPDX-header guarantee without over-constraining IDE workflows. @HansOlsson Does this address your concerns?

gwr-de avatar Apr 27 '25 21:04 gwr-de

Handling comments (and whitespace in general) sensibly in an environment where a user can:

  • Edit a text view of a class
  • Edit a text view of a class that contains contents from multiple files (natural case in Modelica)
  • Edit most parts of a class through GUI operations
  • Re-organize which classes are stored in their own file or in the file of the parent class at will

is really really hard. Not only technically, but in some cases, even saying what the "correct" or "expected" behavior should be.

Any tool that re-serializes a .mo file—whether to pretty-print, split, concatenate, or export—shall preserve the very first contiguous block of comments (line or block form) immediately before the first non-shebang/XML token (typically the within clause) verbatim. All other comments may be handled as whitespace.

It's not immediately obvious what "preserve" would mean in all scenarios. I'll just give one which is not obvious to me:

If you have a package like this:

A (in package.mo, this file has a leading comment) |- B (also in above package.mo, also has a comment just before the class begins) |- C (also in above package.mo, has no leading comment)

and using GUI functionality to move A.B and/or A.C into their own file(s), what would the expected result be? If you then later join them back together, what should happen?

(Since System Modeler was called out explicitly in a comment above, I will mention that the upcoming version has whitespace and comment handling reworked from the ground up, and will preserve original/user created formatting to a very high degree. Regrettably, it will still move a top comment as discussed in this issue to just inside the class. I hope we will do better in this case as well in a future version.)

maltelenz avatar Apr 28 '25 08:04 maltelenz

I can completely understand what you are saying, Malte. Currently, I find GUIs great for modeling–but they tend to have some kind of diminishing rates of utility once you try to get to more serious development, which still ends up being "programming."

I would suggest to tie a comment immediately at the top of a file or immediately before a class definition to a class (I am not a computer scientist, but maybe it might be like a class-specific 0-level prefix "preamble"—in any case such comments should be more than just whitespace). It should be moved around with that class but will, of course, not be inherited.

I still believe that developers should be enabled to do this at their discretion and that this ought to be a Spec-level guarantee.

gwr-de avatar Apr 28 '25 10:04 gwr-de

I would prefer to see this as a tool issue outside the normative scope of the Modelica language specification.

I think we can all see that managing comments in general is not a well defined task for GUI operations such as creating, deleting or moving elements, or changing the package hierarchy breakdown into files. However, the grammar we already have allows comments anywhere in the stored-definition, and a comment appearing before the class-definition seems particularly well suited for being properly preserved unless the breakdown into files is change. One could consider adding a non-normative sentence or two to point this out, but I'd rather completely leave out the problem of preserving white-space and comments from the specification.

henrikt-ma avatar Apr 28 '25 21:04 henrikt-ma

I appreciate the perspective that comment-handling borders on “tool territory,” but from a library developer’s vantage point it really feels like our source code is treated as a second-class citizen. In Modelica—unlike C, Python, Julia or Rust—your .mo files exist only as a transient serialization of an internal model. Insert a SPDX header at the top of a big file, re-export to one-file-per-class, and your legal boilerplate vanishes.

That’s no accident: Modelica’s specification (§ 2.2) treats comments as whitespace, so tools are free to discard or move them without warning. But in every other serious open-source ecosystem we already draw a clear line: the first comment of each source file carries machine-readable licensing (SPDX). See, for example:

  • SPDX spec (ISO 5962) and FSFE REUSE both recommend per-file tag–value headers.
  • The GNU GPL FAQ recommends putting a license notice in each source file.
  • The Linux kernel mandates an SPDX-License-Identifier at the very first comment-able line.
  • Many languages and ecosystems follow this pattern.

From a legal standpoint—under the Berne Convention and copyleft licenses—relocating or dropping your top-of-file license text without warning isn’t just bad engineering; it strips you of the very rights your chosen license guarantees. After all, it’s the developer who bears the legal risk—not the tool vendor—effectively letting proprietary exporters off the hook. It forces developers into a never-ending game of “guess which tool respects my header today.”

So, again, here's my proposal:

  1. Add a non-normative note in § 2.2 that explains why a first comment block (e.g., the SPDX header) should be exempt from being treated as mere whitespace.
  2. Follow up with a normative spec amendment, e.g.,

Tools shall preserve the first contiguous block of comments (line or block form) appearing at the first possible comment position in a .mo file—that is, at the very first line, excepting only an initial shebang (#!…) or XML prolog (<?xml…?>)—verbatim when re-serializing the file. This SPDX header block shall not be moved or removed.

That minimal spec-level guarantee—and the visibility it brings—lets every developer know at a glance that their license header is special, not disposable, in both technical (SBOM/SCA) and legal (copyleft compliance) senses.

As long as we leave this to ad-hoc tool behavior, open-source Modelica authors will remain at the mercy of whichever vendor happens to be writing the exporter. Let’s give them a right they can rely on, rather than a promise that quietly disappears.

gwr-de avatar Apr 30 '25 12:04 gwr-de

What can and should be the next steps in this regard?

gwr-de avatar May 06 '25 18:05 gwr-de

Language meeting notes: Tools should try to preserve the comments (neither moving nor removing), but there are a number of cases where it will not work well.

See also https://specification.modelica.org/master/annotations.html#license-texts (added for 3.7) that can be used. However, it seems doesn't satisfy this need:

  • although tools can automatically include it for e.g., FMUs.
  • tools can ask when copying (may have some exceptions for examples).

HansOlsson avatar May 08 '25 14:05 HansOlsson

Commenting as requested during language meeting.

Modelon Impact will preserve comment before within-clause for most simple editing operations, including drag-and-drop a new component in GUI and saving with the code editor.

There are some operations where it is not preserved though, for example duplicating class.

MarkusOlssonModelon avatar May 08 '25 15:05 MarkusOlssonModelon

It’s great to see this proposal being discussed. Recently, I experimented with the REUSE framework in a new Modelica project, and I’d like to share first experiences that illustrate how future tooling might guide users toward full compliance.

Why Per-File Compliance Matters

In complex Modelica libraries, mixing different assets—Modelica code, SVG icons, binaries—requires per-file licensing metadata so that each artifact is properly attributed and redistributable.

Project Setup for REUSE Compliance

REUSE treats each file in isolation. In my project:

  • SVG icons carry the Apache-2.0 license and live in a subdirectory of the Resources/ folder. They are embedded in partial class components “by reference.”
  • Modelica source code is licensed under a strong copyleft license (e.g., GPL-3.0-or-later).
  • Other assets (binaries, docs) my use different licenses or carry proprietary exemption (e.g., trademarks).

The two pillars of compliance are: (1) SPDX tags at the top of every source file, and (2) an optional REUSE.toml to declare bulk attributions.

LICENSES/ Directory

At the project root, include a LICENSES/ folder containing each license’s full text verbatim, named exactly as in the SPDX license list:

LICENSES/
├─ Apache-2.0.txt
├─ GPL-3.0-or-later.txt
├─ CC0-1.0.txt
└─ LicenseRef-ProjectName-trademark.txt   # proprietary trademark terms

The trademark file (e.g., logos) indicates exemptions not covered by FOSS licenses.

SPDX Tagging

DIrect Tagging

Embed SPDX tags at the top of each .mo, .svg, or text-based source. For binaries, place a .license file alongside:

SomeLogo.png
SomeLogo.png.license

The content of the SomeLogo.png.license file may then simply be:

SPDX-FileCopyrightText: [year] [name] [email]
SPDX-License-Identifier: LicenseRef-ProjectName-trademark
Indirect Tagging via REUSE.toml

A central REUSE.toml can reduce repetition. Example:

# REUSE.toml for ProjectName
version = 1

#Modelica sources
[[annotations]]
path = ["ProjectName/**/*.mo", "ProjectName/**/*.order"]
precedence = "closest"
SPDX-FileCopyrightText = "[year] [name] [email]"
SPDX-License-Identifier = "GPL-3.0-or-later"

# Git metadata files
[[annotations]]
path = [".gitattributes", ".gitignore"]
precedence = "closest"
SPDX-FileCopyrightText = "[year] [name] [email]"
SPDX-License-Identifier = "CC0-1.0"

CI-Driven Compliance Check

Integrate reuse lint into your CI pipeline to enforce compliance automatically:

❯ reuse lint
# SUMMARY

* Bad licenses: 0
* Deprecated licenses: 0
* Licenses without file extension: 0
* Missing licenses: 0
* Unused licenses: 0
* Used licenses: LicenseRef-ProjectName-trademark, CC0-1.0, GPL-3.0-or-later, Apache-2.0
* Read errors: 0
* Files with copyright information: 512 / 512
* Files with license information: 512 / 512

Congratulations! Your project is compliant with version 3.3 of the REUSE Specification :-)

gwr-de avatar May 09 '25 12:05 gwr-de