ideas
ideas copied to clipboard
Develop and create and code a new formal model of Free and Libre and Open-Source licenses
Project description
Moved from https://github.com/FHPythonUtils/LicenseMatrix/issues/8
Sometimes one has to automatically combine software and data into other software. Unfortunately in the current legal system copyright is not yet abolished.
This would make us to compute the effect of combining different works distributed under different terms on the combined work. And if we combine works automatically, this should also be done automatically.
There are some machine-readable descriptions of licenses, such as
- TL;DRLegal dataset
- GitHub "ChooseALicense" dataset.
Pioneered by TL;DRLegal and later mimicked by GitHub the description main part are lists of enum values, each enum value is ideally meant to have a precise meaning. There are 3 lists, one describes what is permitted to someone, another one describes what a licensee is required to do, and another one describes what someone would have hard times to do.
This model is flawed, it has no real use other than drawing beautiful tables with indicators:
- Do you notice the word
someone? It is because it is messed, sometimes features incan't/limitationslist mean that the author cannot, sometimes that contributor cannot, sometimes that user cannot. The same shit with other lists. - Features in lists have arbitrary meanings.
- The features are mostly binary, but are placed into different columns instead of setting a binary value to a feature;
- Setting a same label to differrent columns simultaneously is possible;
- The same label may mean very different thing in different licenses.
This model is inadequate to the domain, it cannot be really used, and we have to create a new one.
So, the goals are:
-
design a model that
- is adequate the domain
- allows operations with the licenses
- allows derivation of license high-level features out of low-level features. An example of a high-level feature is
permissive/copyleft/proprietary
-
create a dataset of licenses described in this model
-
create a software library implementing the operations
The model
The flaws of the model used of the existing datasets originate from the fact that the information about to which parties and relations each restriction is applied to is lost and so the ones encoding the info about licenses have to use the descriptions that make very little sense.
We address it by modelling parties, interactions between them and interactions between interactions and parties and so on as a graph. There is a base graph with all possible interactos on it, and a concrete license is an almost binary vector assigning to each interaction a boolean, meaning if it is present in the license or not.
First, here is a .dot file expressing the graph: https://github.com/KOLANICH/LicenseMatrix/blob/relation_graph/DOCS/relations.dot
There are some roles, the same person can play multiple roles simultaneously:
License author(l) - the one who has written license text and copyrighted it. An example of such an author are FSF and Sam Hocevar. Can impose restrictions on what can be called a license with a particular name.Copyright Holder(h) - the one who applies a license, probably a customized one. Doesn't write any code - it iscontributorwho writes it. If a holder has created a library, he is also acontributor.Contributor(c) - anyone who distributes modified or non-modified versions of a library in source code form. Doesn't hold any copyrights on the library and even own contributions within it in the context. In the situations when he holds copyrights on contributions, a new graph is created where he is considered a holder and the original holder is not present, but the holder is subject to restrictions in the parent graph.User(u) - anyone who creates a program that uses the library and also everyone who distributes the library in the ready-to-be-executed form.End User(e) - anyone who uses any program or a lib created by auserand may gain any money from a client.Client of the End User("third party") - client getting commercial services from anend user.
And here are examples
Ahas written a lib and licensed it under GPL and distributes via his website source codes.Ais aholderandcontributor.Aworks forBand developed a lib forBandBhas paid him for that.Bapplied AGPL and has created a proprietary app available via Web, for access to which charges. The lib itself is not distributed byB.Bis aholder,user, andend user.Adistributes the sources and binaries of the lib (AGPL allows it), both original ones and with own modifications he holds copyrights to, and provides a paid support to it (including the one toB). He is acontributorand auser.Cwants to use the lib in a closed-source service, he is legally liable to obtain a permission fromB.Cis auserand anend-user. If he wants to use the modified version, also fromA(in this case a subgraph is created in whichAis the holder for his contributions).Djust uses a lib in his app, and uses the app himself to provide pentest services, but doesn't distribute it. He is auser, and anend-user, and anend user client.
Each role is a node in a graph. Between nodes there are edges - relations between actors. And these edges are nodes in an another graph. I guess it may be called a metagraph, but I am not sure. GraphViz doesn't allow edges which ends are edges, so we introduce "metanodes" in the middle of edges. These nodes are named using the following naming convention: <source><destination>r, where sources are always roles, and destinations can be either nodes or metanodes.
A same person can play simultaniously multiple roles, in this case the restrictions on himself vanish.
Licenses are the tools making illegal some relations between third parties. Each license is a set of features. Each feature either restricts an interaction in a specific way, or not.
I.e. give-credit either requires any user of the lib to add an advertisement of holder name into own lib/app docs. This restriction originates from holder's will and privilege to use the current legal system to punish the ones not following his will.
So, the relation is between user and end user, it corresponds to a metanode uer, and holder imposes a restriction on this relation, creating a relation from himself to that relation h uer r, so the restriction give-credit has an edge to the node huerr.
The virality class of a library is determined by virality-boundary, which is the relation before which actors must apply the same license.
virality-boundary = 0 - permissive
virality-boundary = ccr + cur - LGPL
virality-boundary = uer - GPL
virality-boundary = etr - AGPL
Permissive licenses are the ones having virality boundary infinitely shifted to "left", plus granting the rights.
I also can think about assigning a similar boundary to each feature, but IDK how much it can be useful.
Relevant Technology
Complexity and required time
Complexity
- [x] Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
Required time (ETA)
- [x] Medium work - A week or two
Categories
- [x] Futuristic Tech/Something Unique
Doesn't SPDX cover a lot of this functionality? https://spdx.org/
Not quite. SPDX covers syntax for combining licenses and adding additional clauses, all of which is treated as a black box. But it doesn't covers the following functionality (which would require modelling licenses themselves), essential to this proposal:
- determining what effect a license has on actors relations. I.e. I am allowed to create a derivative work of works under certain licenses, but am not allowed to distribute it. Some tools generating code automatically should generate different code basing on the fact if the result will be publicly distributed.
- checking license compatibility
- detecting combined license kind (strong copyleft, weak copyleft, permissive, public domain, proprietary)