proposal: merge `subject.source` and `subject.id` into a global reference / uri / id
Problem
Some points about subject.source and subject.id trouble me (in the usage):
- The
sourceof a subject is optional, and if not set, it should default to thecontext.source. But context (source+id) references an event, not a subject - Sometimes a reference to a subject in another subject's predicate use:
xxx.source+xxx.id(egchangeinartifact.packaged,serviceandenvironmentinincident.reported), in this casesourceis also optional (but no default)xxxId(egartifactIdinservice.deployed)
- Sometimes a predicate could reference a subject that may not "exist" via a cdevents (or is not managed by the emitter of the current event, emitter of event related to service reference artifact, environment for which no event may have been emitted, and if they have event referencing should be "compatible", I can details more about "compatibility").
(Personal usage) I also used subject.id to name the subject (eg service), but as the name could not be unique enough in a global/organization/environment context, the id is completed with some namespace group.
Proposal
-
merge
subject.sourceintosubject.id:subject.idbecomes a URI-reference migration:subject.id={subject.source}/{subject.id}(no defaulting tocontext.source, concatenation ifsubject.sourceis not empty)"properties": { "id": { "type": "string", "minLength": 1 }, "source": { "type": "string", "minLength": 1, "format": "uri-reference" } }, "additionalProperties": false, "type": "object", "required": [ "id" ] }becomes
"properties": { "id": { "type": "string", "minLength": 1 "format": "uri-reference" } }, "additionalProperties": false, "type": "object", "required": [ "id" ] } -
For references, fixing the inconsistency by
- changing every
xxxIdintoxxx.id(URI to struct) - OR changing every
xxx.id(+xxx.sourcemerged) intoxxxId(struct to URI)
- changing every
-
Maybe rename
idtouri, which should allow replacing/merging someidanduridefined in some subject liketicket
Pros
- better consistency between identifications and references
- clarification that
subject.id(subject.uri) is the absolute reference to thesubject(doesn't mean it should be an absolute URI, but some convention could emerge like p-url) - no more confusion about the
subject.sourceand the split betweensourceandidto have uniqueness and meaningful info - Simplifications of reference, and subject's definition
- Simpler to manage one field than a tuple
Cons
- breaking change
Edits: partial copy/paste from https://github.com/cdevents/spec/blob/main/spec.md as reminder
id (subject)
- Type: [
String][typesystem] - Description: Identifier for a subject.
Subsequent events associated to the same subject MUST use the same subject
id.
source (subject)
-
Type: [
URI-Reference][typesystem] -
Description: defines the context in which the subject originated. In most cases the
sourceof the subject matches thesourceof the event. This field should be used only in cases where thesourceof the subject is different from thesourceof the event.The format and semantic of the subject
sourceare the same of those of the contextsource.
source (context)
-
Type: [
URI-Reference][typesystem] -
Description: defines the context in which an event happened. The main purpose of the source is to provide global uniqueness for
source+id.The source MAY identify a single producer or a group of producer that belong to the same application.
When selecting the format for the source, it may be useful to think about how clients may use it. Using the root use cases as reference:
- A client may want to react only to events sent by a specific service, like the instance of Tekton that runs in a specific cluster or the instance of Jenkins managed by team X
- A client may want to collate all events coming from a specific source for monitoring, observability or visualization purposes
As discussed in the SIG, I believe this is an issue with very vague definitions and descriptions of our fields. I do not think we need to merge source or id given that they mean two very different things.
Instead, we need to take time to document and defined, in an opinionated manner, of what each field is
To illustrate I'll share some sample from https://github.com/cdviz-dev/cdviz-collector/tree/main/examples/assets/outputs/transform-github_events (cdviz don't use subject.source when it convert github's events):
"context": {
"id": "0",
"source": "https://api.github.com/repos/cdviz-dev/cdviz-collector/actions/jobs/36016488192",
"timestamp": "2025-01-22T19:01:11+00:00",
"type": "dev.cdevents.taskrun.started.0.2.0",
"version": "0.4.1"
},
"subject": {
"content": {
"pipelineRun": {
"id": "12915191049/1"
},
"taskName": "cdviz-dev/cdviz-collector/MegaLinter/MegaLinter",
"url": "https://github.com/cdviz-dev/cdviz-collector/actions/runs/12915191049/job/36016488192"
},
"id": "36016488192/1",
"type": "taskRun"
}
- The
subjet.idis composed of 2 parts:{id of the job}/{attempt_number}` because a job could be launch several times and use the same id but for different event (each launch triggers an event) - The
subject.idcould not be combined with thecontext.sourceas the source is the jobs that trigger the taskRun - The
subject.content.PipelineRun.idis also composed of 2 parts (same raison) - The `
- The
context.idis unique (0=> content id (hash) generated when the event is built & sent), so it could not be combined withcontext.source. - Maybe the
context.sourceshould not reference github, as this event is not sent by github but a translation of a github event. taskNameis long because dashboards display the taskName (and search/filter by taskName) and just the latest part is not enough to identify (task can be shared by workflow of a project, several projects share tasks name (eg.ci,build, ...)
I agree that we need "clarification, opinionated definition and "real" example for fields. source & id are good place to start, but it should be done for each "subject" type.