Adding attribute of distribution relevance at license findings / files / folders
(I am sorry of that is in the current set of issues, I looked into all of the issues)
When there are scan detected or concluded licenses, fossology can maintain information (at the users choice) about understanding their "normal" distribution context (putting into quotes, because that refers to a difficulty to be precise here):
- does this file actually land in the binary, for a software product, or is this
- test code and documents
- documentation
- infrastructure scripts, files for building
Would this be something others are also interested in to maintain?
I feel uncertain about if this is actually something for an SPDX file, but at least fossology started to maintain such info, allowing the license expert to express that the found licensing is not relevant for the usual distribution use case.
I am aware about that still files in the mentioned areas can be used still: developers might reuse test code in their products, or use image files from the documentation, etc pp. But in the majority of the cases this does not happen and for automation of tooling it would be helpful to rule out found licensing for example when creating product documentation.
I think this is covered by https://spdx.github.io/spdx-spec/4-file-information/#43-file-type already.
The only thing that is unclear to me is: How do I reflect that some files have not been analysed?
e.g. doc folder marked as filetype DOCUMENTATION but not every bundled jQuery plugin that is used to show a nice doc page is analysed.
@jlovejoy @goneall What do you think about it?
@bufferoverflow hm, not quite my idea, I see 4.3.1 as file type in the file system sense and it is not clear if these is distribution relevant (Text, Image, Audio, Binary, Other ...) or not (Source code for testing)
I am asking for distinguishing between files relevant for distribution (in most cases: source code, but not always) and files, not relevant for the distribution (as listed above: documentation, test data, test source code, etc).
Sorry if that was not clear, but re-reading my description appears to be OK still.
this sounds like or similar to the concept of "use-case" - that is, how is the file being actually used - is it distributed, not distributed, etc. This is obviously very useful information for compliance purposes and an interesting idea for an optional field in the spec. But I'd think we'd need to do quite a bit of thinking in terms of how to define it and express it. Maybe something to discuss on a joint tech/legal team call in 2019?
I believe the above can be (mostly) handled by a combination of file types and relationships.
Here's a proposed approach:
-
Create an SPDX element (a file or an external SPDX document) to represent the product being shipped. If you want this to be a binary, you would have it be an SPDX file with a file type of binary. If you want it to represent a more abstract "product" you could reference an SPDX package (e.g. "shipping-product").
-
Create a relationship between a file and the above created element. The relationship would describe if it is compiled into the shipping package, or used in documentation etc. I think we have relationship types already defined for most of these.
All of that being said, it would be much simpler if we just had a flag for an individual file to indicate if something is actually distributed/deployed/conveyed.
In my audit work, I always include a field for each file indicating if it is "not distributed" as it is typically very relevant to the license obligations. Including such a field would benefit some of the tooling I use.
Thank you @jlovejoy and @goneall , I'm offline during January/February. But I'm pretty sure that Oliver Fendt will get in touch with you.
It sounds like that there is still a large document with all files and their license assessment and then some of the pointing to a separate document or a use case?
I agree that this would work, however, I feel like there is a slight deviation in the intention of the purpose. I think the purpose. The purpose was to reduce the delivered information, keeping items out so that they are not part of the "shipping-product" not increasing it.
Agreed that representing a "use case" can be useful. I think the challenge with putting it as an attribute on a package or file, is that a single package or file can have many different use cases depending on who's perspective it's being describe from. (I think this echoes some of the comments above as well.)
Just to make up an example -- and this may not be useful, so feel free to disregard =) -- just trying to think through this:
Suppose open source project ABC releases code in a package that contains three subdirectories: server/, with code intended to run on a server; client/, with JavaScript code intended to be sent over the wire to clients' web browsers; and test/, with test code for the server and client files.
- From the project's perspective, all of this is distributed code. The project is distributing all of it to downstream recipients.
- From the perspective of a company offering it as a cloud / SaaS product, they'd presumably say something like that the test code is not distributed by them (just used internally); the server code is hosted by them; and the client code is distributed to their users.
- From the perspective of someone reusing a file in another project, it could have any use case at all. As long as they comply with the license, they could take the file and reuse it in any conceivable use case.
@goneall's comments about Relationships seem to me like it reflects the (currently) right way to handle this. E.g., to be able to say "this file is a TEST_CASE_OF this other file" -- or package, etc. I guess I'm just echoing some of the comments above that "use case" isn't necessarily an intrinsic fact about a Package or File. But I could be persuaded otherwise.
Per @jlovejoy's comment, it could be a useful topic for a joint tech / legal team meeting. Or perhaps a discussion at an SPDX monthly general meeting?
see pull request #145
@kestewart - is this resolved with PR #145 ? if so, please close.
Since this is still marked as "licensing", I'm closing this as sufficiently resolved by #145 and/or stale as having no further comments for 5+ years. Please feel free to reopen if you disagree. Thanks!