spdx-spec Snippet range internal consistency

Snippet range internal consistency

Open davaya opened this issue 3 years ago • 5 comments

https://spdx.github.io/spdx-spec/5-snippet-information/#53-snippet-byte-range contains a tag example

SnippetByteRange: 310:420

and an RDF example

<Snippet rdf:about="...">
    <range>
        <ptr:StartEndPointer>
            <ptr:startPointer>
                <ptr:ByteOffsetPointer>
                    <ptr:reference rdf:resource="#SPDXRef-fileReference/>
                    <ptr:offset>310</ptr:offset>
                </ptr:ByteOffsetPointer>
            </ptr:startPointer>
            <ptr:endPointer>
                <ptr:ByteOffsetPointer>
                    <ptr:reference  rdf:resource="#SPDXRef-fileReference/>
                    <ptr:offset>420</ptr:offset>
                </ptr:ByteOffsetPointer>
            </ptr:endPointer>
        </ptr: StartEndPointer>
    </range>
</Snippet>

The tag example is self-consistent - the range start and the end are both offsets, and they refer to the same file identified in section 5.2 SnippetFromFile:

    <snippetFromFile rdf:about="#SPDXRef-filecontainingsnippet">
        ...
    </Snippet>

The RDF example adds a huge amount of detail, all of which serves only as an attack surface or opportunity for error.

Consider this example with intentional inconsistencies:

<Snippet rdf:about="#SPDXRef-filecontainingsnippet">
    <range>
        <ptr:StartEndPointer>
            <ptr:startPointer>
                <ptr:ByteOffsetPointer>
                    <ptr:reference rdf:resource="#SPDXRef-file2Reference/>
                    <ptr:offset>310</ptr:offset>
                </ptr:ByteOffsetPointer>
            </ptr:startPointer>
            <ptr:endPointer>
              <ptr:LineCharPointer>
                  <ptr:reference rdf:resource="#SPDXRef-file3Reference"/>
                  <ptr:lineNumber>23</ptr:lineNumber>
              </ptr:LineCharPointer>
            </ptr:endPointer>
        </ptr: StartEndPointer>
    </range>
</Snippet>

The start and end refer to two different resources, each of which is different from the filecontainingsnippet, and the end pointer is a line number instead of a byte offset. Do any tools detect that start and end pointers must be the same type, and that resources must agree with the rdf:about?

The JSON data example follows the RDF structure rather than the tag structure, and the JSON schema gives up and doesn't attempt to validate the data.

  "snippets" : [ {
    "SPDXID" : "SPDXRef-Snippet",
    "comment" : "...",
    "copyrightText" : "...",
    "licenseComments" : "...",
    "licenseConcluded" : "GPL-2.0-only",
    "licenseInfoInSnippets" : [ "GPL-2.0-only" ],
    "name" : "from linux kernel",
    "ranges" : [ {
      "endPointer" : {
        "lineNumber" : 23,
        "reference" : "SPDXRef-DoapSource"
      },
      "startPointer" : {
        "lineNumber" : 5,
        "reference" : "SPDXRef-DoapSource"
      }
    }, {
      "endPointer" : {
        "offset" : 420,
        "reference" : "SPDXRef-DoapSource"
      },
      "startPointer" : {
        "offset" : 310,
        "reference" : "SPDXRef-DoapSource"
      }
    } ],
    "snippetFromFile" : "SPDXRef-DoapSource"
  } ],

Suggest that the JSON "ranges" property be replaced with the far simpler, easier to validate, and less error prone structure consistent with the tag format:

{
    "lineRanges": [[5, 23], [56, 68]],
    "byteRanges": [[310, 420]],
    "snippetFromFile" : "SPDXRef-DoapSource"
}

and that a shacl shape consistent with this structure be used in the v3 ontology.

Jul 06 '21 21:07 davaya

There is a proposal to simplify this in version 3.0 of SPDX. So for, no one has objected to simplifying.

The general syntax is fine with me, but I would prefer to have a separate Snippet object for each range.

A snippet is defined to be a range of code, if there is code from a different origin in between ranges, I would consider that 2 separate snippets.

This would make it easier to correct if a snippet was incorrectly ID'd or if 2 separate consecutive lines were from different origins.

Jul 08 '21 04:07 goneall

BTW - the reason for the rather complex syntax in RDF is we wanted to implement an existing W3C draft standard for pointers.

Jul 08 '21 04:07 goneall

A separate Snippet per code range is good.

I'd think that if W3C were to create a pointer range, they would come up with something that specified each bit of data only once. The "bytes with optional line numbers" for a single code range is semantically different from two separate pointer ranges specifying bytes from code and lines from code.

Ask WWW3CD? 😊

{
    ...
    "name" : "from linux kernel",
    "snippetFromFile" : "SPDXRef-DoapSource",
    "byteRange": [310, 420],
    "lineRange": [5, 23]
}

Jul 08 '21 12:07 davaya

Big +1 to this. It's made worse by the fact that byte range is a required property for snippets according to https://spdx.github.io/spdx-spec/v2.3/snippet-information/#93-snippet-byte-range-field. Any tool supporting snippets has to deal with all of the possible inconsistencies pointed out by @davaya if it offers validation, which is a major and unnecessary pain. I am confused that this issue is not more prominent - are snippets not used much?

In my opinion, the json spec made the situation even worse:

It only requires either a byte range or a line range, which is inconsistent with the tag/value spec: https://github.com/spdx/spdx-spec/blob/v2.3/schemas/spdx-schema.json#L651-L700 (maybe an issue of itself)
It allows for any number of ranges (instead of exactly one byte range and up to one line range)
startPointer and endPointer can have both offset and lineNumber
startPointer and endPointer could also have neither offset nor lineNumber

Nov 10 '22 13:11 nicoweidner

I am confused that this issue is not more prominent - are snippets not used much?

Snippets are not very common, but they are used.

The main reason this proposal hasn't been accepted yet is that it would be a breaking change - hence targeted for 3.0. I'm pretty sure it will get into 3.0 since we all agree it needs to be addressed.

Nov 11 '22 15:11 goneall

Resolved with 3.0.

Apr 04 '24 16:04 goneall

spdx-spec spdx-spec copied to clipboard

Snippet range internal consistency

spdx-spec
spdx-spec copied to clipboard