spec icon indicating copy to clipboard operation
spec copied to clipboard

proposal: merge `subject.source` and `subject.id` into a global reference / uri / id

Open davidB opened this issue 8 months ago • 2 comments

Problem

Some points about subject.source and subject.id trouble me (in the usage):

  • The source of a subject is optional, and if not set, it should default to the context.source. But context (source+id) references an event, not a subject
  • Sometimes a reference to a subject in another subject's predicate use:
    • xxx.source + xxx.id (eg change in artifact.packaged, service and environment in incident.reported), in this case source is also optional (but no default)
    • xxxId (eg artifactId in service.deployed)
  • Sometimes a predicate could reference a subject that may not "exist" via a cdevents (or is not managed by the emitter of the current event, emitter of event related to service reference artifact, environment for which no event may have been emitted, and if they have event referencing should be "compatible", I can details more about "compatibility").

(Personal usage) I also used subject.id to name the subject (eg service), but as the name could not be unique enough in a global/organization/environment context, the id is completed with some namespace group.

Proposal

  • merge subject.source into subject.id: subject.id becomes a URI-reference migration: subject.id = {subject.source}/{subject.id} (no defaulting to context.source, concatenation if subject.source is not empty)

                "properties": {
                  "id": {
                    "type": "string",
                    "minLength": 1
                  },
                  "source": {
                    "type": "string",
                    "minLength": 1,
                    "format": "uri-reference"
                  }
                },
                "additionalProperties": false,
                "type": "object",
                "required": [
                  "id"
                ]
              }
    

    becomes

                "properties": {
                  "id": {
                    "type": "string",
                    "minLength": 1
                    "format": "uri-reference"
                  }
                },
                "additionalProperties": false,
                "type": "object",
                "required": [
                  "id"
                ]
              }
    
  • For references, fixing the inconsistency by

    1. changing every xxxId into xxx.id (URI to struct)
    2. OR changing every xxx.id (+ xxx.source merged) into xxxId (struct to URI)
  • Maybe rename id to uri, which should allow replacing/merging some id and uri defined in some subject like ticket

Pros

  • better consistency between identifications and references
  • clarification that subject.id (subject.uri) is the absolute reference to the subject (doesn't mean it should be an absolute URI, but some convention could emerge like p-url)
  • no more confusion about the subject.source and the split between source and id to have uniqueness and meaningful info
  • Simplifications of reference, and subject's definition
  • Simpler to manage one field than a tuple

Cons

  • breaking change

Edits: partial copy/paste from https://github.com/cdevents/spec/blob/main/spec.md as reminder

id (subject)

  • Type: [String][typesystem]
  • Description: Identifier for a subject. Subsequent events associated to the same subject MUST use the same subject id.

source (subject)

  • Type: [URI-Reference][typesystem]

  • Description: defines the context in which the subject originated. In most cases the source of the subject matches the source of the event. This field should be used only in cases where the source of the subject is different from the source of the event.

    The format and semantic of the subject source are the same of those of the context source.

source (context)

  • Type: [URI-Reference][typesystem]

  • Description: defines the context in which an event happened. The main purpose of the source is to provide global uniqueness for source + id.

    The source MAY identify a single producer or a group of producer that belong to the same application.

    When selecting the format for the source, it may be useful to think about how clients may use it. Using the root use cases as reference:

    • A client may want to react only to events sent by a specific service, like the instance of Tekton that runs in a specific cluster or the instance of Jenkins managed by team X
    • A client may want to collate all events coming from a specific source for monitoring, observability or visualization purposes

davidB avatar May 05 '25 13:05 davidB

As discussed in the SIG, I believe this is an issue with very vague definitions and descriptions of our fields. I do not think we need to merge source or id given that they mean two very different things.

Instead, we need to take time to document and defined, in an opinionated manner, of what each field is

xibz avatar May 05 '25 16:05 xibz

To illustrate I'll share some sample from https://github.com/cdviz-dev/cdviz-collector/tree/main/examples/assets/outputs/transform-github_events (cdviz don't use subject.source when it convert github's events):

    "context": {
      "id": "0",
      "source": "https://api.github.com/repos/cdviz-dev/cdviz-collector/actions/jobs/36016488192",
      "timestamp": "2025-01-22T19:01:11+00:00",
      "type": "dev.cdevents.taskrun.started.0.2.0",
      "version": "0.4.1"
    },
    "subject": {
      "content": {
        "pipelineRun": {
          "id": "12915191049/1"
        },
        "taskName": "cdviz-dev/cdviz-collector/MegaLinter/MegaLinter",
        "url": "https://github.com/cdviz-dev/cdviz-collector/actions/runs/12915191049/job/36016488192"
      },
      "id": "36016488192/1",
      "type": "taskRun"
    }
  • The subjet.id is composed of 2 parts: {id of the job}/{attempt_number}` because a job could be launch several times and use the same id but for different event (each launch triggers an event)
  • The subject.id could not be combined with the context.source as the source is the jobs that trigger the taskRun
  • The subject.content.PipelineRun.id is also composed of 2 parts (same raison)
  • The `
  • The context.id is unique (0 => content id (hash) generated when the event is built & sent), so it could not be combined with context.source.
  • Maybe the context.source should not reference github, as this event is not sent by github but a translation of a github event.
  • taskName is long because dashboards display the taskName (and search/filter by taskName) and just the latest part is not enough to identify (task can be shared by workflow of a project, several projects share tasks name (eg. ci, build, ...)

I agree that we need "clarification, opinionated definition and "real" example for fields. source & id are good place to start, but it should be done for each "subject" type.

davidB avatar May 05 '25 16:05 davidB