spdx-3-model icon indicating copy to clipboard operation
spdx-3-model copied to clipboard

Add default container properties

Open davaya opened this issue 2 years ago • 7 comments
trafficstars

Per @zvr in issue #393, the default values of minCount and maxCount are implicitly:

  • minCount: 0
  • maxCount: *

This PR adds these defaults explicitly to the model, and adds default values applicable to properties where maxCount is greater than 1:

  • isOrdered: false
  • isUnique: true
  • isOptional: false

The four combinations of isOrdered and isUnique specify the container type:

Type isOrdered isUnique
Set false true
List true false
Record true true
Bag false false

The default type is Set, where duplicate values are an error and ordering does not affect equality: [a, b] is equal to [b, a]. Ordered lists and ordered sets are also common but less frequently used - where isOrdered is true, [a, b] is not equal to [b, a]. Bag is rare, but if [a, a, b] must be permitted as a valid value and [a, a, b] is equal to [a, b, a] and not equal to [a, b], the model can define it. These defaults will rarely be overridden by type definitions, but if it is necessary to define a list of values where order matters, it should be possible to do so.

If set to true, isOptional declares that a property with maxCount > 1 may be omitted regardless of the value of minCount. For example a property x may have minCount=3, maxCount=5, and isOptional=true to declare that the property may be omitted but if present must have the specified item count. This is normally used with minCount=1, maxCount=*, isOptional=true to eliminate the ambiguity of having two different values of nil:

  • property x omitted, and
  • property x present with value [ ].

davaya avatar Jun 29 '23 19:06 davaya

The logical model is independent of all serializations, not specific to RDF or to JSON. The logical class notation is independent of both serializations and programming languages - Armin is implementing them in Python code, but the notion of "class", "property" and "type" aren't Python-specific.

UML section 7.5 defines Multiplicity Elements (not SPDX Elements, but generic containers) with the four container properties.

RDF also supports containers, about which the 1999 spec says:

The definitions of Bag and Sequence explicitly permit duplicate values. RDF does not define a core concept of Set, which would be a Bag with no duplicates, because the RDF core does not mandate an enforcement mechanism in the event of violations of such constraints. Future work layered on the RDF core may define such facilities.

Sean should be able to answer if RDF has matured in the last 24 years to support sets. It's hard to believe it does not, but even if it doesn't, that's a matter of enforcement, not syntax. The model files can define all four container types and particular serializations can ignore enforcement if they don't support it.

There's a huge benefit to supporting isUnique: a List or Bag with [1..*] could contain hundreds or millions of items, but if CreationInfo/profile is a Set or Record with [1..*] items, it can contain no more than 8. There is no need to count the number of profiles defined in the model, set maxCount to 8, and update it each time a new profile is defined, you can leave maxCount=* and isUnique=true and the valid serialized data automatically tracks the number of profiles.

Also not sure if that is the right place to put these things. Profiles probably not want to overwrite that (and giving them the possibility feels weird).

I agree, and am open to suggestions for where best to include them in the model.

Right now Core says "The Core namespace defines foundational concepts serving as the basis for all SPDX-3.0 profiles.", which means the alternatives are:

  1. use the existing syntax along with a statement that profiles shall not override them, or
  2. make up a new and different syntax that would have to be defined and supported.

It seems easier to enforce the no-override policy in software than to make up something different, but either way is OK.

davaya avatar Jul 09 '23 16:07 davaya

The logical model is independent of all serializations, not specific to RDF or to JSON.

I think one of our disconnects is thinking of RDF as a serialization. RDF is a data model with several different serialization formats. Reference Wikipedia definition of RDF.

goneall avatar Jul 09 '23 18:07 goneall

After leaving that last comment, I though I should follow up with my opinion that RDF is a data model we should support but I don't believe it is perfect and I'm quite open to supporting additional data models if they solve use cases RDF does not solve.

goneall avatar Jul 09 '23 18:07 goneall

@maxhbr - have the changes you were looking for been made?

kestewart avatar Jan 16 '24 18:01 kestewart

@davaya Would this be covered by ElementCollection in the current model as of 2024-01-16?

nishakm avatar Jan 16 '24 18:01 nishakm

No, I think that support for isOrdered, isUnique and isOptional is not yet supported by spec-parser and this PR contains no documentation that describes their semantics. These come also from the object oriented view of the model, and RDF persons might disagree on the concept.

maxhbr avatar Jan 16 '24 18:01 maxhbr

Since this isn't supported by the spec parser, I'm moving this to a 3.1 release for consideration

goneall avatar Apr 03 '24 19:04 goneall